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Abstract 

The studies of the Higgs boson couplings based on the recent and upcoming LHC data 
open up a new window on physics beyond the Standard Model. In this paper, we propose 
a statistical guide to the consistent treatment of the theoretical uncertainties entering the 
Higgs rate fits. Both the Bayesian and frequentist approaches are systematically analysed 
in a unified formalism. We present analytical expressions for the marginal likelihoods, 
useful to implement simultaneously the experimental and theoretical uncertainties. We 
review the various origins of the theoretical errors (QCD, EFT, PDF, production mode 
contamination...). All these individual uncertainties are thoroughly combined with the 
help of moment-based considerations. The theoretical correlations among Higgs detection 
channels appear to affect the location and size of the best-ht regions in the space of Higgs 
couplings. We discuss the recurrent question of the shape of the prior distributions for 
the individual theoretical errors and find that a nearly Gaussian prior arises from the error 
combinations. We also develop the bias approach, which is an alternative to marginalisation 
providing more conservative results. The statistical framework to apply the bias principle 
is introduced and two realisations of the bias are proposed. Finally, depending on the 
statistical treatment, the Standard Model prediction for the Higgs signal strengths is found 
to lie within either the 68% or 95% confidence level region obtained from the latest analyses 
of the 7 and 8 TeV LHC datasets. 


* sylvain@ift. unesp.br 
^moreau@th.u-psud.fr 



Contents 


1 Introduction and snmmary 4 

2 Statistical preliminaries 6 

2.1 Need-to-know frequentist and Bayesian statistics 6 

2.2 Treatment of nuisance parameters 8 

2.2.1 Marginalisation principle 8 

2.2.2 Bias principle 9 

3 Combinations of theoretical uncertainties 11 

3.1 Error modelisation 11 

3.2 Bayesian combination of theoretical uncertainties 12 

3.3 Frequentist combination of theoretical uncertainties 13 

3.4 The leading moment approximation 14 

3.5 Combining uncertainties in the bias approach 15 

4 The Higgs boson rates 17 

4.1 The data 17 

4.2 New physics parametrisation 18 

5 The Higgs likelihood 20 

5.1 The base likelihood 20 

5.2 The uncertainty on the signal strengths 21 

5.3 The structure of the Higgs theoretical uncertainties 21 

6 Combining the Higgs rate uncertainties 22 

6.1 Combining the PDF and as uncertainties 23 

6.2 Scale and EFT errors: the amplitude uncertainties 28 

6.3 Combination of the PDF and amplitude errors 30 

6.4 The production contamination 32 

6.5 The uncertainties on branching ratios 34 

6.6 Summary 36 

7 Marginalising the Higgs likelihood 37 

7.1 Correlations of the detection channels 37 

7.2 The Bayesian analytical likelihood 39 

7.3 The frequentist treatment 40 

7.3.1 The marginal likelihood 40 

7.3.2 The frequentist analytical likelihood 41 

7.4 Numerical results 42 

7.4.1 The forbidden case: no correlations 43 

7.4.2 Flat prior 44 

7.4.3 Gaussian prior 45 


2 


7.4.4 The nuisance parameters favoured by the data 46 

7.4.5 More conservative theoretical errors 47 

8 Biasing the Higgs likelihood 49 

8.1 Combining the uncertainties 49 

8.2 The Bayesian approach 50 

8.2.1 Extremal bias 50 

8.2.2 Envelope method 51 

8.3 The frequentist approach 51 

8.3.1 Extremal bias 51 

8.3.2 Envelope method 51 

8.4 Numerical results 52 

8.4.1 Extremal bias 52 

8.4.2 Envelope method 53 

9 Conclusions 55 

A The leading moment approximation 57 


3 


1 Introduction and summary 

Besides the historical discovery of a resonance around 125 GeV [1, 2] that is most probably 
the Brout-Englert-Higgs boson responsible for the ElectroWeak (EW) symmetry break¬ 
ing [3] , the ATLAS and CMS Collaborations have provided a set of 88 rate measurements - 
based on the full dataset collected so far with luminosities of ~ 5 fb“^ at the center of mass 
energy t/s = 7 TeV and ~ 20 fb“^ at y/s = 8 TeV [4, 5] (see also Ref. [6, 7]) - that con¬ 
stitutes a new and precious source of indirect information on physics beyond the Standard 
Model (SM). Indeed, observing deviations of the Higgs boson rates with respect to their 
SM predictions would reveal the presence of an underlying theory while the absence of such 
deviations allows one to strongly constrain new models (see for example Ref. [8] for higher¬ 
dimensional models, Ref. [9] for composite Higgs theories and Ref. [10] for supersymmetric 
scenarios). So far, no signs from an unknown world have came out from the data, but this 
is only the beginning of a long exploration, given the expected LHC upgrades [11]. 

The fits of the Higgs rates (c./. Ref. [12] for the first set of analyses. Ref. [13-16] for 
the results after the Moriond 2013 winter conference and Ref. [4, 5] for the latest official 
ATLAS and CMS analyses) are thus obviously important. Now certain aspects of these 
analyses remain to be worked out in order to obtain the final fits for testing new physics. 
First, the precise likelihood functions associated to the experimental rates (in particular 
their specific shapes and the complete correlations between channels) are not provided in 
the present public papers, although they might be expected at some point. Second, a major 
part of the theoretical uncertainties is due to QCD calculations of the Higgs production 
rates [17-20] and their treatments in the fits raise questions in the Higgs physics community 
(see Ref. [21, 22] for recent discussions). Taking carefully into account these theoretical 
uncertainties is crucial for the Higgs fits due to the following reasons. 

First, theoretical uncertainties can be sizeable with respect to the experimental ones. 
The QCD uncertainty on the gluon-gluon fusion mechanism dominantly involved in most 
of the Higgs discovery channels induces typically an error of ~ 10% on signal strengths 
(see Section 6), that is already comparable to the experimental error bars in several Higgs 
channels which reach values down to ~ 20% [4-7]. Besides, considering for instance the 
CMS prospectives at y/s = 14 TeV with a luminosity of 300 fb“^, the experimental error 
bars are around ~ 5% (with same systematic errors as today) for the diphoton final state 
and less than ~ 10% for the r-lepton, Z and W boson channels [11] so that the theoretical 
error might even become the dominant one in some channels. 

Second, theoretical uncertainties might be of the same magnitude as the main potential 
deviations due to new physics. For instance the maximal corrections to Higgs couplings 
estimated in Ref. [23] for characteristic composite Higgs and supersymmetric models lead 
typically to deviations of the signal strengths between ~ 2% and tens of percent compared 
to SM. This is of the same order as the theoretical error mentioned above, so that one is 
precisely in the situation where the theoretical error deserves a careful treatment to test 
new physics scenarios. ^ 

°In the case of no new states, related to the EW symmetry breaking, directly observed at the LHC. 

^This intermediate situation is to be contrasted with the two extreme cases of expected signal strength 
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Therefore, in this paper, our primarily goal is to answer precisely the question : what 
is the correct treatment of the theoretical uncertainties in the fits of the Higgs boson rates? 
This seemingly simple question has lead us to several new developments, summarized in 
the three lines of work described in the paragraphs below. 

First, we present a systematic survey of the various statistical treatments of the theoreti¬ 
cal error and their applications to the Higgs fits within a unified formalism. We confront 
the frequentist and Bayesian frameworks, ^ ^ that prove to exhibit a certain degree of 
convergence at the level of accuracy of the present LHC data.^ We also compare the 
marginalisation and bias treatments. In the former, we consider the representative cases of 
Gaussian and flat combined priors because of the lack of knowledge inherent to the distri¬ 
bution of theoretical uncertainties. ® We find the Gaussian prior to be well motivated by 
the full combination of each individual theoretical uncertainty. It turns out that the choice 
of one among all these statistical approaches may affect significantly the determination of 
the Higgs properties. It is thus important to understand precisely the conceptual differ¬ 
ences between these approaches. Finally, this survey is the opportunity to provide useful 
analytical expressions for the marginalised likelihood functions, including the theoretical 
correlations among the Higgs channels. 

Second, we explain precisely the principle of bias ‘ and its fundamental differences with 
the marginalisation principle. The bias principle is more conservative than the marginal¬ 
isation principle by construction and does not depend on the shape of the priors of the 
nuisance parameters. This thorough examination of the bias principle leads naturally to 
introduce a statistical framework for biasing. We propose two realisations of the bias, re¬ 
ferred to as the extremal bias and the envelope method, that apply in both frequentist and 
Bayesian contexts. Regarding the error combinations, important differences arise between 
the marginalisation and bias frameworks. ® 

deviations much higher than the theoretical error (which can then be neglected) or deviations well smaller 
(no hope to detect them). In both of these cases, a detailed treatment of the theoretical error would not 
be really needed to test new physic scenarios. 

^Throughout this paper, we use generically the expression “theoretical error” to denote any error on the 
SM prediction for the Higgs rates. This is a slight wording abuse, because certain of these errors like the 
ones from the PDF determination have a partial experimental origin. 

^Sometimes in the literature, there are inconsistencies in the sense that errors are combined in a fre¬ 
quentist way (combination depending on the prior shape) while the priors are convoluted in a Bayesian way 
(convolution via integrations). 

^A pure Bayesian fit of the Higgs rates has been carried out in Ref. [16]. 

®To be contrasted with the preliminary study of Ref. [24] based on simulated Higgs data. 

®To the best of our knowledge, a flat prior for the theoretical uncertainty is for the first time applied to 
the Higgs fits. Notice also that the combination in quadrature of the theoretical and experimental errors, 
sometimes made in the literature, is equivalent to a marginalisation assuming Gaussian distributions for 
both sources of errors and neglecting the correlations. This is true in both frequentist and Bayesian cases. 

bias has been applied once in Ref. [14]. The analysis developed here improves the bias performed in 
Ref. [14] by including more effects like the production contamination, the individual scale/EFT/PDF errors, 
the branching fraction uncertainties, the correlations between Higgs channels and the Bayesian/frequentist 
cases. 

®For example, the PDF and amplitude uncertainties for the ggF mechanism are summed in quadrature 
in the Bayesian marginalisation, whereas they are linearly summed in the bias approach. 
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Third, we discuss and implement several improvements in the treatment of the theoretical 
uncertainties, (i) For the cross sections, the combinations of all the individual uncertainties 
are discussed exhaustively, including in particular the several errors constituting the parton 
PDF uncertainty. The so-called leading moment approximation is developed to facilitate 
the combination of such a high number of errors, (ii) The error contamination by various 
production modes and the errors on the Higgs branching ratios are taken into account, (in) 
The correlations between the theoretical errors on the various Higgs detection channels are 
included. ® We show that these theoretical correlations induce significant shifts of the best- 
fit regions in the Higgs coupling parameter space, (iv) A Higgs fit with more conservative 
theoretical errors is shown to illustrate the potential impact from the imperfect knowledge 
of the magnitude of these errors. 

For each of the statistical approaches developed along these three lines of work, we provide 
the up-to-date Higgs fit results based on the latest available data from the 7 and 8 TeV 
LHC, that can be readily used for new physics tests. From the theory side, we have up¬ 
dated the major gluon-gluon Fusion mechanism by using its reduced perturbative QCD 
error, issued from the recent calculation up to N^LO [25]. We have also included the theo¬ 
retical uncertainty on this production mode due to the use of an Effective Field Theory in 
the amplitude calculation [25-27], so that the whole error on the cross section remains at 
~ 10 %. 


2 Statistical preliminaries 

This section condenses the basic elements of frequentist and Bayesian statistics that will 
be used along the paper. In addition to statistical basics, the principle of bias is also 
presented. 

2.1 Need-to-know frequentist and Bayesian statistics 

In order to extract some information about a new physics model from a set of data, the 
central quantity to study is the likelihood function [28]. The likelihood function is equal 
to the conditional probability density for obtaining the observed data, taken as a function of 
the hypothesis. In the case of predictions made in a given hypothesis H with n parameters 
{9n} = 9, the likelihood function reads 

L{9)=p{d\H,9), (2.1) 

where d represents the set of data. Note that the likelihood is defined up to an overall factor. 
In the present work, the data we will consider are the set of signal strength measurements 
from LHC and Tevatron, described in Section 4.1. 

In particle physics, the likelihood function encloses a statistical uncertainty associated 
with the data. This is the uncertainty coming from the fluctuations inherent to the ob¬ 
servation of a quantum process. This statistical uncertainty tends to zero in the limit of 

®We notice that such correlations were included e.g. in Ref. [15] for the specihc assumption of errors 
with Gaussian priors and neglecting the correlations among different Higgs production modes. 

^°Note this is an abuse of language, the likelihood function is actually a distribution. 
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a large amount of data. However, other sources of uncertainty can be present, both on 
the experimental or the theoretical side. For example, uncertainties arise from the finite 
resolution of a detector, or from the finite accuracy of a computation. These systematic 
uncertainties do not depend on the amount of data, and need to be taken into carefully. 
In this paper, we are going to have a close look at the theoretical systematic uncertainties. 

The starting point for modeling a systematic uncertainty is to explicitly parametrize 
it. Namely, one introduces a set of new parameters, 5 = {5^}, which explicitly modifies the 
likelihood, 

L{e,5). ( 2 . 2 ) 

These new parameters are named nuisance parameters, as opposite to the 0’s which are 
considered as the parameters of interest. This step of parametrisation is common to the 
frequentist and Bayesian frameworks, and is fairly universal. Discrepancies will appear in 
the way the d’s are treated, and will be at the center of our attention in the rest of the 
paper. Two fundamentally different points of view on how to treat the nuisance parameters, 
denoted as marginalisation and bias, will be further identified (in both the frequentist and 
Bayesian contexts). 

In Bayesian statistics, model parameters are genuine random variables. They are 
associated with a so-called prior distribution, noted 7r(0). In order to carry out a process 
of inference (for example, setting exclusion bounds), the relevant object to study is the 
posterior distribution, 

p{H,e\d)(xL{e)7r{e). (2.3) 

In this framework, a so-called 1 — a Bayesian credible region is defined by the domain 
= {0 I p{H, 9\d) > Pa}, where pa is determined by the fraction of integrated posterior 

Jadep{H,S\d) “• 

H being the whole parameter space. The 1 — a Bayesian Credible (BC) contour is the 
boundary of and it corresponds to the contour level defined as {0 | p(Id, 6\d) = pa}- In 
what follows we will use the BC contours at 

l-a = {68.27% , 95.45% , 99.73%} . (2.5) 

In frequentist statistics, the likelihood function is employed to build a statistical test. 
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like the likelihood ratio 
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The probability density function {pdf) of this test is then computed by simulation (typically, 
using Monte-Carlo pseudo-data). The pdf of q{6), noted fq, can then be used to evaluate 
a p-value, typically of the form 

1*00 

P(6') = / fq{q'\d)dq', (2.7) 

Jqd 

where q^ is the value given by the actual data. The 1 — a confidence regions are then 
obtained by solving p(0) = a, i.e. the confidence regions are given by Oq = {0|p(0) > a}. 

Whenever the likelihood is Gaussian, q follows a distribution. One has then l — a = 
F^\qa), where is the cumulative function with n degrees of freedoms. Confidence 
regions can thus be obtained by plotting q{6) = qa- This simpler procedure is commonly 
used in the literature, even when the likelihood is not Gaussian. We adopt this procedure 
throughout this paper. In the case where the likelihoods are bivariate (which will be the 
case of our example of Higgs fit), we adopt the threshold values 


g = {2.30, 6.18, 11.83} . (2.8) 

In the Gaussian limit, these values match exactly the confidence levels 1 — a = {68.27%, 
95.45%, 99.73%}. 


2.2 Treatment of nuisance parameters 
2.2.1 Marginalisation principle 

Having introduced the nuisance parameters 6 in the likelihood L{9, 5), the next step is to 
eliminate them. This will effectively deform the likelihood, enlarging the preferred regions, 
and possibly shift their central values. In the Bayesian framework, this is naturally done 
by integrating over <5, so that 

Lb(0) = [ d6Lie,6)TT{6), (2.9) 

Jv 

where ti{6) is the prior distribution for the 5 parameters. This operation is named marginal¬ 
isation. In the frequentist framework, the likelihood is instead maximized, 

Lf(^) = max [L(0, (5)7r((5)] . (2-10) 

SgT) 

classical frequentist statistics, hypotheses and parameters are not associated with probabilities. In 
this paper, for the frequentist side, we adopt the more general framework of hybrid Bayesian-frequentist 
statistics, in which a distribution can be attributed to a nuisance parameter. Conceptually, such distribution 
cannot be seen as a prior pdf, but corresponds to the likelihood for a real or imaginary measurement 
constraining the nuisance parameter (see Ref. [57], p. 4). However, by abuse of language, we will sometimes 
use the term “prior” in frequentist statistics as well. Classical frequentist statistics are recovered by giving 
a flat shape to these frequentist “prior” distributions. 

^^Recall that we have defined <5 as a set of nuisance parameters, S = {5i}. The subsequent integrations 
and maximisations will thus be multidimensional. 
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This operation is usually named profiling. Here however, in order to emphasize the paral¬ 
lel between Bayesian and frequentist cases, we also refer to it as “marginalisation”. The 
outcome of Bayesian and frequentist marginalisation gives respectively the marginal likeli¬ 
hoods Lb and Lp. The best-fit regions are then obtained by using Lb and Lp in Eqs. (2.4) 
and (2.6), respectively. Finally, let us notice that in the frequentist case, it is clear that the 
marginalisation operation has the effect of selecting the values of 6 preferred by the data. 

2.2.2 Bias principle 

The common feature of Bayesian and frequentist marginalisations is that nuisance para¬ 
meters contribute to goodness-of-fit. This implies that the nuisance parameters can relax a 
tension among various measurements, which in turn induces a shift of the best-ht regions. 
In the context of the search for new physics, such a shift could also be characteristic of the 
presence of a new physics signal. It is thus of highest importance to correctly understand 
the effects of nuisance parameters, in order not to confuse systematic uncertainties with 
the presence of new physics! 

In order to explicitly expose the shifts induced by nuisance parameters, and ultimately 
obtain more conservative results, a useful approach is to define a new operation, alternative 
to marginalising, with the requirement that the nuisance parameters do not contribute to 
goodness-of-ht. We will refer to this principle as bias, as opposite to the marginalisation 
principle. We will see that the bias principle provides results that are independent of the 
shape of the prior of the nuisance parameters. 

The bias principle can be intuitively grasped as follows. Consider the likelihood L[6, 5) 
with a single nuisance parameter on the interval 5 G [da, ^b]- Instead of marginalising over 5, 
one can look at the contours of the likelihood for various discrete values of 5, say 5 = 5a, Sb- 
For each value of 5, the contours are given by Fq. (2.4) (Bayesian) or Eq. (2.6) (frequentist). 
To obtain the contours, we can see that the likelihood is separately normalised for 5a and 5b. 
This normalisation is in general not the same for 5a and 5b. Because of this normalisation 
factor, no particular value of 5 is preferred by the fit. It is this normalisation factor that 
concretely realises the bias principle. 

In Bayesian statistics, the bias principle finds a general realisation as follows. The 
requirement one wants to implement is that the nuisance parameters 5 do not contribute 
to goodness-of-ht. This is equivalent to ask that the 5 do not have a preferred region once 
data are taken into account. To translate formally this condition, the relevant quantity to 
involve is the marginal posterior of 5, p{5\d). To implement the bias principle, one should 
thus require p{5\d) to be constant, which translates into the condition 


^p((5|d) = 0, 

(2.11) 

with 


p{5\d)= I deL{e,5)TT{5)Tr{9). 

Jn 

(2.12) 

We see that the condition (2.11) hxes the 7r(<5) prior to be 


^ J^'deL{e'5')nle)' 

(2.13) 
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This peculiar prior is not independent on data, and is thus not orthodox with respect 
to the usual Bayesian philosophy. This is an expected consequence of biasing and all 
quantities are nevertheless well defined. It follows that the posterior for 9 and 5 has the form 
L(0, 5)ti{ 9)/ j dO [L{9, (5)7r(0)]. The Bayesian bias likelihood is then given by marginalising 
this particular posterior with respect to the nuisance parameters, 

Lb(0)= [ dS 
Jv 

In frequentist statistics, the bias principle is realized in a very similar way to the 
Bayesian case. The quantity telling how 5 is constrained by the data is the marginal likeli¬ 
hood for 6 (with its associated “prior”), max [L(0, 5)7r(0)7r((5)], which selects the preferred 
9 for a given 6. One requires this marginal likelihood to be constant, 

^ max [L(0, (5)7r(0)7r((5)] = 0 . (2-15) 


L{9,6) 


L d9L(9, 6 )tt(9) 


(2.14) 


This implies that the 7r((I) “prior” satisfies 


7r(<5) 


1 


maxL(0, 6)7r{9) 


(2.16) 


The marginal likelihood of 9 is then given by 


L-p{9) = max 

6&V 


L{9,5) 

max[L(0, (5)7r(0)] 


(2.17) 


This operation is sometimes referred to as the envelope method. This is because, for a 
continuous domain T>, it draws continuous regions which are wider than the ones obtained 
by marginalising. 

Comparing the Bayesian and frequentist realisations of the bias principle, Eq. (2.14) 
and Eq. (2.17), it appears that the resulting bias operations are fully similar: the ex¬ 
pressions Eq. (2.14) and Eq. (2.17) are identical up to interchanging maximisation and 
integration. 

Let us finally comment about the best-fit regions for the bias likelihoods. The Bayesian 
bias is a particular case of Bayesian marginalisation with a well-chosen prior. The contours 
are thus obtained by integration, using Lb in Eq. (2.4). For the frequentist bias, the 
bias likelihood Lp can be treated using the usual likelihood ratio test and computing the 
associated p-value, as described in Eq. (2.6). We conclude that the best-ht regions for both 
the Bayesian and frequentist bias are well-defined. 

Let us make an important comment which will turn useful for the frequentist treat¬ 
ments in Section 8. For a single 5 in the discrete domain T) = {da, d^}, the best-fit regions 

Using L = e~^ one has the equivalent formulation of the envelope method in terms of 

= mm - 21og7r(6l) - mm[x^(6',(5) - 21og7r(6l)]j . (2.18) 

In case of classical frequentist statistics, tv{6) is a constant, so that the two log7r(6') terms cancel. 
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obtained by inserting the likelihood (2.17) in Eq. (2.6) reproduce exactly the ones in the 
discrete version of the bias described earlier in this subsection. Indeed, the normalized 
likelihood (2.17) will lead to a denominator equal to one in Eq. (2.6) and the role of 
this denominator in the contour definition will be played instead by the denominator of 
Eq. (2.17). 

In this paper, we will refer to the general realisations of the bias principle given by 
Eq. (2.14), (2.17) as the envelope method, for both the Bayesian and frequentist versions. 
In contrast, the discrete version of the bias previously introduced can be seen as a minimal 
realisation of this principle. In this paper, we will refer to it as the extremal bias, for both 
the Bayesian and frequentist versions. 

3 Combinations of theoretical uncertainties 

This section applies to any systematic uncertainties. Nevertheless, since in this paper our 
main focus is on theoretical uncertainties, we will readily use this term. In the previous 
section, we have seen that the correct procedure to incorporate theoretical uncertainties 
into the likelihood is to model these uncertainties using nuisance parameters and treat them 
using either the marginalisation or the bias approach. Erom the practical point of view, this 
step of marginalisation can be computationally heavy to carry out, both in the Bayesian 
and frequentist cases. Indeed, for each point in the space of parameters of interest, for n 
nuisance parameters, either a n-dimensional integration or a n-dimensional maximisation 
has to be done, whose complexity typically grows exponentially with n. 

Because of the cost of exact marginalisation, it is a common practice in the high-energy 
physics community to combine certain uncertainties in a preliminary step, before carrying 
out the operation of marginalising. This approach of “preliminary combinations” should 
be followed with some care, because it can be approximative and may contain implicit 
assumptions. In this section, we revisit and develop the various operations of preliminary 
combination on a firm statistical ground. 

3.1 Error modelisation 

Let Q be an arbitrary quantity entering into a base likelihood L[Q]. The uncertainty about 
Q can be modelled via a dependence of the form 

Q Q X [1 (5 A), (3.1) 

where 6 is the nuisance parameter, associated with a distribution vr((5), defined over the 
domain P. Here and throughout this paper, without loss of generality, we let all the 6 
follow a “standard distribution”, such that all the information about the magnitude of the 
uncertainty will be contained in the coefficient A. With this parametrisation, A represents 
the relative uncertainty associated with Q. This linear model (3.1) is valid for any vr 
distribution, provided that the magnitude of the relative error is small, A <C I. The actual 
definition of tt depends on the statistical approach adopted. In the Bayesian case, 6 is 


II 


a random variable, so that one chooses E[(5] = 0, V[<5] = 1. Note that the domain of 
6 can be either finite or infinite. In the hybrid frequentist case, one can follow the same 
conventions as for the Bayesian case. The classical frequentist case is equivalent to have 
a flat TT, and one sets the domain to be P = [—1,1] in that case. For the errors we will 
consider, tt will always be centred on zero. 


3.2 Bayesian combination of theoretical uncertainties 

In the Bayesian framework, a nuisance parameter 6 is rigorously taken as a random variable 
with prior distribution tt. In presence of various nuisance parameters, one may wish to 
combine various sources of error, say 6a and 6b- A combination of these sources can be 
done if they appear systematically into a single combination inside the likelihood, L[6a^a+ 
6b^b]- One can then define the combined error 6c + ^b^b-, so that 

L[6c^c\ oc f d6A d6B 6[6 a^a + 6b^b — 6c^c\ L[6 a^a + 6b^b] '^A,Bi6A, 6 b) , 

(3.2) 

where (5[x] is the Dirac distribution. Here tta,b is the common prior of Ja; 6b- If these are 
independent, one has tta,b{6a,6b) = '^a{6a)'^b{6b)- Note that the integration over 6c of 
the left-hand side of this equation recovers Eq. (2.9). 

When 6a and 6b are independent, Eq. (2.9) implies that the distribution of 6c is 
exactly given by a convolution product, 


TTc 


xc 

Ac 


■ f X \ f Xc - X 


(3.3) 


The variable x can be seen as JA. It is convenient to define Ttc{x) = vrc i'ii® 

width of 7fc is given by Ac. In contrast, recall that the width of ttc is always normalized 
to one by convention. Using the tt definition, the convolution (3.3) can simply be written 


as 


TTC {xc) = dx tta (x) ttb {xc - x) , 


(3.4) 


or more shortly 


T^C = T^A* T^B ■ 


(3.5) 


The resulting distribution vrc has in general a non trivial shape, except for example when 
both TTAand ttb are Gaussian, in which case vrc is Gaussian as well. In contrast, Eq. (3.3) 
implies that the magnitudes of the errors Aa, A^ are combined following 


A^ = A^ + A^ 


B ) 


(3.6) 


irrespective of the shape of the distributions. That is, the errors are always combined in 
quadrature, i.e. the variances always add-up. Note the A^’s correspond to the variance of 
the vr distributions. 

and V respectively denote the expected value and variance operators, E[5] = dS S 7r(S) and V[5] = 
^d5 52 7r(5)-(E[51)h 
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In case of two independent sets of several correlated variables 6 ^, 1 , with respective 
covariance matrices Ca, Cb, combined as 6 c,i = 6 A,i + 63 , 1 , the combination is naturally 
generalized to 

Cc =Ca+Cb ■ (3.7) 

Again, this is independent of the prior shapes. The distribution of 6 c,i is again obtained 
using Eq. (3.2). 

Finally, one may wish to combine nuisance parameters that are themselves correlated. 
In the case of two nuisance parameters 6a, 6b with a correlation coefficient p, one gets 

Aq = A\ + + 2pAaAb , (3-8) 

giving rise to a linear combination in the fully (anti-)correlated case p = ±1, and to 
Eq. (3.6) in the de-correlated case p = 0. The combination (3.8) is still independent of the 
prior shapes. Note that in this case ttc is still obtained from Eq. (3.2), but is not given 
anymore by a convolution product because tta and ttb are not factorised anymore. 

Finally, in the case of two sets of nuisance parameters 6 A,i, 63,1 with a relative corre¬ 
lation matrix Cab , one gets 

Cc = Ca + Cb + 2Cab ■ (3.9) 

All the results of this subsection are straightforward to derive using characteristic functions 
(see Appendix A). 

In the limit A a A^, it appears that ttc ~ tta, the combined prior has mainly 
the shape of the leading uncertainty. In Section 3.4, we demonstrate that it is well justihed 
to use Eq. (3.6), which is exact, together with the approximation vrc txa- Beyond the 
A/i Ab limit, if one wishes to care about the shape of vrc, a conservative approach is to 
consider both extreme cases ttc = a and ttc = t^b- This is because the actual shape of vrc 
is always an intermediate distribution between vr^ and vr^, as dictated by the convolution 
product. 


3.3 Prequentist combination of theoretical uncertainties 

Let us start again with the nuisance parameters 6a, 63 and their associated “prior” dis¬ 
tribution 'ka,b- If the nuisance parameters enter as a single combination in the likelihood, 
L[ 6 aAa + 6 bAb\, one can define the nuisance parameter 6 c as above, and write 


L[ 6 cAc] vrc(5c) oc max 


5[<^aAa + 6bAb — 6cAc] L[6 aAa + 63Ab] t^a,b{6a, 6b) , 


(3.10) 

where again 5[x] is the Dirac distribution. We emphasis that this formula is exactly 
similar to the Bayesian one, Eq. (3.2), with integration replaced by marginalisation. When 
'^a,b{^a,^b) = a{^a)t^b{^b), it appears then that the distribution of 6 c is given by 


[ xc\ 

■Kc — OC max 

\^cj 

Note that in this case, for simplicity, we used a different convention from the one-variable case: we do 
not factor out the magnitude of the uncertainties (Ai) in front of the 5i. 

^®Here 5 [a;] can be taken as the regularised Dirac peak. 


X \ Xc — X 

' aI ) V 


(3.11) 
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This formula has a convolution product structure, where the integration has been replaced 
by a maximisation. From that point, it is then possible to compute the frequentist cor¬ 
relation matrix, = —d‘^\ogL/d9id9j. The general formula for the combination of Ca, 
Cb is straightforward but tedious to compute. In sharp contrast with the Bayesian case, 
it appears in the frequentist case that the combination of the correlation matrices Ca, Cb 
accordingly to Eq. (3.11) depends on the shape of the tta, t^b distributions. 

In the particular case where both tta, t^b are Gaussian, the combination appears to 
be in quadrature, as in the Bayesian case. The combination formulas then match exactly 
the Bayesian ones, Eqs. (3.6) and (3.7). Moreover ttc is also Gaussian. Another important 
particular case is the one of flat priors. In that case, ttc appears to be flat, and the 
combination is linear, 

Ac = Aa + Ab. (3.12) 

Note that no correlation matrix can be defined in the flat case. 

In the case where 6a and 6b are correlated, they should be treated with a common 
“prior” as in the Bayesian case. 

3.4 The leading moment approximation 

Consider again the Bayesian case of a combination of two nuisance parameters, 6 cAc = 
6 aAa + 6 bAb- Recall that the 6 parameters have zero mean and have a standard distri¬ 
bution so that E[5] = 0, V[5] = 1. Assume further that the magnitude of the uncertainty 
B is small with respect to the uncertainty A, 

Aa»Ab. (3.13) 

When this condition is satisfied, the source of uncertainty B can be treated as a perturba¬ 
tion to the source of uncertainty A. Starting from this observation, one can obtain ttc up 
to Ab/Aa corrections (see Eq. (A.9)). This is demonstrated in Appendix A using charac¬ 
teristic functions. In particular, for independent variables, at the first non-trivial order in 
the expansion, one obtains that 

TTC TTA (3-14) 

— + (3.15) 

Recall that vrc is determined by the convolution product ttc = Hence for A a S> 

Ab, one can intuitively expect that the shape of tta and ttc are similar (see Eq. (3.14)), 
even though their widths are different (according to Eq. (3.15)). In case 6a and 6b are 
correlated, Eq. (3.15) has to be replaced be Eq. (3.8). 

This “leading moment” approximation is useful in presence of a hierarchy between 
the magnitude of the various uncertainties. It dictates how to consistently capture the 
main effects of the uncertainties into the likelihood. This in turn allows one to obtain 

the multivariate case, Sa,, and 5b, i have in general a non-trivial domain Va, Bb- The combined 
domain T>c is given by the distance ||fc,i|| for which the centers of ©a and are aligned with 5c,i and 
the domain Da and Db share a single point. For example if Da, Db are “hyper-rectangles” with size Aaa, 
AB,i, the sizes simply add up just like in the one-dimensional case, Ac,i = Aaa + As,;. 
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an approximate form for the combined priors, which opens up the possibility of obtaining 
analytical expressions for the marginal likelihoods. 

The leading moment approximation also applies when 5a and 5b appear in various 
linear combinations within the likelihood. This situation typically happens when various 
observables are affected by the same source of uncertainty. The case of two nuisance param¬ 
eters and two combinations is discussed in Appendix A. One considers two combinations 
5c^l^Cx = <5aAai + 5bAbi, 5c2 ^C2 = ^A^A 2 + ^b^B 2 - It is found that the Ac ^_2 are 
obtained as in the one-combination case discussed above. The correlation coefficient be¬ 
tween 5cx and 5 c2 requires more attention. If /S.Ai ^ ^B\: A^j) it is found to be 

approximately equal to one. This implies that the shapes of the distributions of dci, ^€2 
and 5a are the same up to A^^ 2 /^Ai 2 corrections (see Eq. (A. 15)), that is 

7rCiC2(<Jci,fc2) ~ '^AiSci)^[Sci - fca] ■ (3.16) 

From Eq. (3.16), it appears that the leading moment approximation reduces the number 
of nuisance parameters in the likelihood. In the case where Aa^ Ab^, Aa 2 ^ A^a, it 
appears that the correlation coefficient between 5ci and 5 c2 is approximately equal to the 
correlation coefficient between 5a and 5b (see Eq. (A. 16)), so that 


^CiC2 ~ '^AB ■ (3-17) 

In the particular case where 5a and 5b are independent, one has 

'^CiC2 ~ T^Ci'^C2 ) ^Ci ~ T^A , TTCa ~ T^B ■ (3.18) 

In the other particular case where 5a and 5b are 100% correlated or anti-correlated, one 
has 

7rCiC2(<Jci, <5 c 2) ~ 7rA(5ci) ±<5 c2]' (3.19) 

All the cases with more variables or more combinations can be deduced recursively from 
the case with two parameters and two combinations studied here. 

3.5 Combining uncertainties in the bias approach 

We now analyse how the combination of uncertainties arises in the case of the method of 
bias. We still consider a combination of nuisance parameters 5a,b entering in the likelihood 
as L[5aAa + 5bAb]- Recall that in our conventions, <5 is a random variable with a fixed 
domain, while A is a number representing the magnitude of the uncertainty. In the bias 
approach, by definition, the shape of the distribution of 5 is set so that 5 does not participate 
to the ht. The information about the uncertainty is thus encoded only in the domain of 
the variable 5A. The choice of this domain has some degree of arbitrariness. This choice 
depends on how conservative one wants the results to be. In the following we choose to 
let 5 vary in the interval [—1,1] and we identify A as a Icr error, i.e. the same way it is 
defined for the marginalisation. 

^®This leading moment approximation will be applied to the theoretical uncertainties on the Higgs rates 
in Sections 6.4 and 6.5. 
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The operation of Bayesian bias can be seen as a special case of marginalisation, where 
the prior is set by Eq. (2.13). As the likelihood we consider in this section depends only 
on the combination 5a^a + this peculiar prior depends only on the combination 

5a^a + Sb^b by construction. Let us denote it as 7r^^^^{6A^A + ^b^b)- In order to get 
the combination 6c^c = ^A^A + ^b^b-, one applies the definition of Eq. (3.2) using the 
■^Mas Pi'ior. It turns out that vrc((Ic) = TT^^^{dc^c)- This means that the domain of 6c^c 
is given by the domain of Sa^a + ^b^b, 

'^ScAc = T^SaAa+SbAb ■ (3-20) 

When 6a and 6b are independent, one has simply 

Ac = Aa + Ab. (3.21) 

When 6a and 6b are 100% correlated positively {i.e. 6 a = 6b), it turns out that one has 
again the combination 

Ac = Aa + Ab ■ (3.22) 

When 6a and 6b are 100% correlated negatively {i.e. 6 a = —^b), the combination reads 

Ac = \Aa-Ab\. (3.23) 

Let us stress that the correlation between <5^ and ds is determined by their common 
domain 'DsaAa,SbAb- The above extreme cases are easily determined. The case of an 
intermediate correlation is trickier as it requires a precise definition of the domain. The 
case of an arbitrary correlation will not be needed throughout this paper. We see that the 
uncertainties are automatically combined linearly in the Bayesian bias method. 

These results above can be applied recursively to more complex combinations. For 
example if 6dA£) = 6aAa + 6bAb + 6cAc, with 6a and 6b 100% anti-correlated and 6c 
independent from the two others, the bias combination gives 

Ad = |Aa — A^l -|- Ac . (3.24) 

Also, the bias combination applies in presence of various linear combinations (labelled by i) 
of the same nuisance parameters. In that case, the result of the combination is a common 
nuisance parameter 6, coming with different magnitudes Aj for each combination. 

The frequentist bias has the same structure as the Bayesian bias. The starting point 
to determine the error combination is to use the frequentist version of the bias prior of 
Eq.(2.16) in Eq. (3.10). It follows that the frequentist combinations are the same as in 
the Bayesian case. We can thus conclude that in the bias approach, the preliminary 
combinations of uncertainties are done linearly, in both the frequentist and Bayesian cases. 
One should remark that such a combination is systematically more conservative than the 
combinations from both the Bayesian and frequentist marginalisations, as can be seen 
comparing Eqs. (3.21), (3.22), (3.23) with for example Eq. (3.8). Note that the combination 
in the frequentist marginalisation with flat prior (see e.g. Eq. (3.12)) is the same as the 
bias combination. Therefore the bias method is also more conservative than the standard 
marginalisation at the level of error combinations. 
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4 The Higgs boson rates 


The couplings of the Higgs boson h are all predicted in the Standard Model, so that any 
deviation from the SM predictions would constitute a sign of the existence of physics beyond 
the SM. The Higgs couplings can be probed by collider experiments, which can produce 
the Higgs on-shell and observe its decays. This process of Higgs production followed by its 
decay is parametrised as 

PP (pp) —> h ^ Y . (4.1) 

The SM Higgs production mechanisms accessible at the LHC (and Tevatron) are i) gluon- 
gluon fusion (ggF), ii) vector boson fusion (VBF), Hi) associated production with an elec- 
troweak gauge boson V = W, Z (VH), and iv) associated production with a tt pair (ttH). 
The main SM Higgs decays observed at the colliders are decays into gauge bosons, h —)■ 77 , 
ZZ, W~^W~, and into heavy fermions, h —)■ bb, rf. The production modes X and final 
states Y will be therefore taken in the following list. 


A = {ggF, VBF, VH, ttH} , 

(4.2) 

Y = { 77 , ZZ, WW, bb, Tf} . 

(4.3) 


4.1 The data 

The Higgs searches at ATLAS, CMS and the Tevatron are focussed on a specific final 
state Y. For each final state, various channels are defined using mutually exclusive cuts. 
Throughout this paper, these experimental channels will be labelled by lower case latin 
indices (i, j ...). We will consider all the 88 channels. A given i contains the information 
on the final state and the specific channel. In the following, it will be sometimes useful to 
refer to the final state Y corresponding to a given channel i. We will use the short notation 
Yi, meaning that Y is taken as a function of the variable i, i.e. Yi = Y{i). 

The results from Higgs searches at the LHC and the Tevatron are reported in terms of 
signal strengths A signal strength is defined as the ratio of the observed event number 
with the expected SM event number. 


jyex 

pT= * 


AtSM • 


(4.4) 


The predicted SM event rate of a process pp (pp) —> h ^ Y is given, in the nar¬ 
row width approximation, by Here ci^ is the production rate, is the 

branching ratio B^ = Fy^/ X]y/ Fp^ and ^ is the integrated luminosity. However, from 
the experimental viewpoint, all the production processes contribute to a given final state. 
Hence the Higgs production cross sections have to be weighted by a selection efficiency 
encoding the effects of kinematical cuts. The actual expected event rates are thus given by 


ivr=^ 

X 


(4.5) 
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where the notation Bf^ is a shortcut for i.e. the index i selects the final state Y. 

The experimental Higgs signal strengths have thus the form 


= 






,SM^SM rSM • 
X^XA^X 


(4.6) 


Note that the kinematical cuts have been to some extent designed to disentangle the pro¬ 
duction modes, so that often one of the efficiencies will dominate over the others. 

The experimental central values of the the associated statistical errors, the ex¬ 
perimental systematic errors, and the selection efficiencies that we will exploit in our 
analysis are taken from the following references. The statistical and experimental system¬ 
atic errors are often combined within these references and will be denoted here as 
Regarding the ATLAS data, the diphoton final state results are taken from Ref. [29], the 
ZZ channel is from Ref. [30], the WW channel from Ref. [31], the bb from Ref. [32] and 
the rr from Ref. [33]. Results are presented as well in Ref. [6] and the combined channels 
are studied in Ref. [4] . 

As for the CMS results, the diphoton final state has been presented in Ref. [34], the ZZ 
channel measurements are provided in Ref. [35], the WW ones in Ref. [36], the bb in 
Ref. [37] and the rr in Ref. [38] (see also Ref. [7] and the combined channel analyses [5]). 
Finally, the latest results from the Tevatron (DO and CDF Collaborations) can be found 
in Ref. [39, 40]. 

Apart from statistical and experimental systematic errors, certain theoretical errors 
on are included in the public results. To the best of our knowledge, the combination 
between these experimental and theoretical uncertainties is often made in quadrature. We 
thus subtract in quadrature these theoretical errors from the provided total uncertainties. 
How to properly (re) introduce the theoretical errors constitutes the main topic of this 
paper, and will be discussed at length in the upcoming sections. 

Finally, we mention that we do not include in our fits more challenging observables 
related to the Higgs pair production [41], off-shell effects, loop-induced Zj final state, 
electron/muon pair final states, final states induced by flavour-changing Higgs couplings, 
nor exotic or invisible final states. Some of those would require to introduce new parameters 
in the Lagrangian that we will consider in Eq. (4.7). The motivation is to keep a simple 
physical framework in order to discuss easily the statistical aspects. In any case, the 
present experimental limits on such Higgs observables are still not stringent enough to 
affect drastically the Higgs fits. Moreover, all the statistical concepts discussed throughout 
the paper can be simply extended to new Higgs observables. 


4.2 New physics parametrisation 

The new physics possibly lying beyond the SM may induce a distortion of the SM Higgs 
couplings. The correct way of dealing with the low-energy manifestation of heavy new 
physics is through the use of an effective Lagrangian (see e.g. Ref. [16] for global fits of 
the Higgs effective Lagrangian). The leading effects on the Higgs sector appear through 
dimension-6 operators. The effective Lagrangian then induces anomalous couplings be¬ 
tween the Higgs and the SM particles. The anomalous couplings to weak bosons and to 
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heavy fermions can be parametrised as 


= cw ghww h + cz ghzz h 

- ctVth iitR -Cbyb h bibn - Cc Vc h crcr - Cr Vt h Trtr + h.c. (4.7) 


where yt,b,c,T are the SM Yukawa coupling constants (in mass eigenbasis), the subscript L/R 
indicates the fermion chirality, v is the Higgs vacuum expectation value, ghww = 
and ghzz = are the EW gauge boson couplings. The cw,z,t,b,c,T parameters are 

defined such that the limiting case cw^z,t,b,c,T 1 corresponds to the SM. New tensor 
structures are also generated by the effective Lagrangian but are not taken into account 
here. 

Our focus being on theoretical uncertainties, we adopt a fairly simple parametrisation 
of the new physics effects. We assume universal deviations for fermion couplings, Cf = ct = 
Cb = Cc = Cr, and for weak bosons, cy = cw = cz- The c/ are assumed to be real. Clearly, 
this simplified description of the new physics effects represents only a piece (operators with 
no extra derivatives) of the full dimension-6 effective Lagrangian. Having cw ~ cz and Cf 
universality is however approximately compatible with certain new physics scenarios, like 
for a warped extra-dimension with bulk custodial symmetry vanishing IR brane kinetic 
terms for EW gauge bosons [42, 43]. Having only two parameters in this simplified 
framework, the results of our fits will systematically be presented in the cy — Cf plane. 

In the hypothesis of the existence of a physics Beyond the SM (BSM) parametrised by 
cy — Cf, the expected signal strength is given by 


gf'[cv,Cf] 


Nr^^[cy,Cf] 


NSM 


mSM^BSM rBSM 

2^x ^x,i ^x 


V cSM^SM rSM ’ 


(4.8) 


jySM defined in Eq. (4.5). This is the theoretical prediction of the experimental signal 

strength defined in Eq. (4.6). Both BSM cross sections and branching ratios 
can be expressed in terms of the SM amplitudes and of cy,Cf. The expressions can for 
example be found in Ref. [44], whose procedure is closely followed here. In all generality, 
the BSM efficiencies are not the same as the ones of the SM either. However, this happens 
when couplings with new tensors structures are generated by new physics. In our simplified 
framework, this does not happen, such that one can safely take = e'^x- 

The SM production cross sections and partial decay widths for the Higgs boson are 
taken, respectively, from the LHC Higgs cross section Working Group (LHCHWG) Ref. [17] 
(see also Ref. [18-20] as well as the recent N^LO ggF computation [25]) and Ref. [17, 20]. 
These numerical results correspond to the rates calculated at the highest orders of EW and 
QCD corrections known so far (mixed EW-QCD at NNLO for the ggE mechanism [27] and 
at NLO for other Higgs production modes). 

^®Note that contrary to a widespread belief, cw = cz is not entirely justified by custodial symmetry [42]. 
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5 The Higgs likelihood 
5.1 The base likelihood 

Having introduced the statistical framework and the Higgs data in Sections 2 to 4, we can 
proceed with building the Higgs likelihood function. We define the base likelihood Lq as 
the likelihood containing the central values of Higgs signal strengths, and the experimental 
uncertainties. The theoretical uncertainties are kept apart from now. Their inclusion into 
the base likelihood will be discussed at length in the next sections and is the central topic 
of this paper. 

In absence of any experimental systematic errors, a signal strength variable follows 
a Poisson statistics, and the associated likelihood is thus a Poisson distribution. When¬ 
ever the event number is large enough, about 0(10) in practice, the likelihood can be 
approximated by a Gaussian. In contrast, in presence of systematic uncertainties, this ap¬ 
proximation generally does not hold. In practice however, the complete likelihood resulting 
from the combination of statistical and experimental systematic errors is not provided in 
the experimental public results. We will therefore model the base likelihood using Gaus¬ 
sian distributions, just as if the shape came out only from the statistical error. Such an 
approximation is expected to be good as long as the systematic error is small with respect 
to the statistical error, as shown in Section 3.4 and Appendix A. 

The observed rates in the current 88 channels (labelled by i,j) are potentially corre¬ 
lated, for example because of the experimental error on the luminosity. The base likelihood 
follows therefore a multivariate normal distribution, 



(5.1) 


where is the correlation matrix among all channels. 

Ideally, each individual observed channel i must be considered in order to take into 
account all the experimental information available on the signal strengths. In practice, 


few elements of this correlation matrix have been provided by the Gollaborations up to 


now. Therefore in the following, we will include only the diagonal elements of CfJ^, given 
by Cff = (A/Lr|^)^, where is the experimental uncertainty extracted from the public 

experimental results. For future releases, we encourage the experimental Gollaborations 
to provide as many elements as possible for the correlation matrix of the individual signal 
strengths. 

Alternatively, to perform the Higgs fits one could think of using the correlations between 
the combined observed rates, that are currently provided by the LHG Gollaborations. 
Although instructive, these combined rates do not keep track of all information since they 
are grouping together different Higgs production modes (which were originally measured 
independently), like i^^BFVH z^ggFttH each Higgs decay channel [6, 7]. Notice that 

Also, we suggest that both the magnitudes of the uncertainties and the correlations should 

be presented without ambiguities, so that the people exterior to the Collaborations be able to properly 
reconstruct the likelihood function. 
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such combined signal strengths also hide some information in the sense that they can result 
from summations over various exclusive selection cut categories. 

5.2 The uncertainty on the signal strengths 

The Higgs theoretical uncertainties we will refer to are the theoretical uncertainties asso¬ 
ciated with the expected event rates Nf^ defined in Eq. (4.5), that are obtained through 
analytical and numerical computations in quantum field theory. These uncertainties will 
propagate both into the experimental signal strengths and into the theoretical strengths 
defined in Eqs. (4.6), (4.8). Following our conventions (see Section 3, Eq. (3.1)), the 
theoretical uncertainty on the Standard Model expected rate in a channel i is written under 
the form 

Arf^^(l + 5fAf), (5.2) 

where is the nuisance parameter with E[(5^] = 0, V[5^] = 1, and represents the 
relative magnitude of the uncertainty. 

The theoretical uncertainty on Nf^ propagates to the experimental signal strength as 

+ = (5.3) 

The case of the theoretical signal strength /Nf^ is slightly trickier. Here we 

focus on the most realistic case where the deviations induced by new physics are small, so 
that the anomalous couplings Ca (with a = (VE, Z, t, b, c, r)) are close to one, i.e. |ca —1| <C 1. 
The contributions from new physics can be linearised with respect to the small parameters 
Ca — 1, so that the BSM event rate in the channel i can be written as 

ATfSM ^ ^SM ^ ^ _ (5 4^ 

a 

In this expression, it appears that the leading source of uncertainty comes from the SM 
event rate uncertainty A^. In the expression of it turns out that this uncertainty can¬ 
cels out at first order between the numerator (A?®'^) and the denominator The 

subleading uncertainties would then come from a term quadratic in and from the rela- 

tive uncertainty (c^ — 1) ^b'sm on the components N^f^. Notice that one can reasonably 

^^BSM 

expect similar QCD errors in the SM and BSM predictions so that AgM . These 

^ a,i 

higher-order contributions are subleading compared to the error on the experimental signal 
strength, given in Eq. (5.3), which is of order A^. In the following, we will thus focus only 
on the uncertainty of the experimental signal strength //|^(I -|- 6 ^Af). 

5.3 The structure of the Higgs theoretical uncertainties 

The theoretical uncertainty on A®^ comes from the errors on the Higgs cross sections 
and partial decay widths Ty^. Still following our conventions, these relative uncertainties 
are written as 

cj™(I -I- S^Ax ), (5.5) 

r|“(I + ,5i:A^). (5.6) 
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The exact content of these errors will be discussed in details in the next section. 

The uncertainty on the partial decay width propagates to the branching ratios. Defi¬ 
ning the relative error on the branching ratios as -|- JyAy), one has 


Y' 




(5.7) 


The uncertainty from the cross sections and branching ratios then propagates to the signal 
strength (4.6) and is thus encoded in a factor /i|^(l -|- where 


JfAf = -4 


= - 






X' 


i SM 

fc^/Ox' 


- (5^ 


yK > 


(5.8) 


Yi = Y(i) being the Y decay mode of the Higgs channel detection i. Note that the sign 
after the first equal symbol is just a convention if the errors are symmetric. 

Finally, the errors on cross sections and partial widths come from several sources. One 
can write those generically as 


5^A^ = 

71 

(5.9) 

<5rAr = 

(5.10) 


n' 


with the relative errors A^, Ay to be detailed in the following. 

Knowing the base likelihood of Eq. (5.1), and knowing where exactly the theoretical 
uncertainties enter, we have the complete Higgs likelihood as a function of all the quantities 
that will have to be treated statistically, namely the nuisance parameters and the effective 
BSM parameters, 

L^(^^‘'[cy,c/];/rf(l + 5fAf)) = Lo (cy, cf, 6^, 6?^') . (5.11) 

Rigorously, the next step is to eliminate the nuisance parameters, <5^, Sy, applying ei¬ 
ther the marginalisation or the bias method. In general these steps should be performed 
numerically, and are computationally heavy. Here however, we will use the methods of 
preliminary combinations advocated in Section 3. Then it will appear that the subsequent 
Higgs likelihoods are much lighter to treat. 


6 Combining the Higgs rate uncertainties 

In this section we shall combine the Higgs rate uncertainties that will be used in the 
marginal likelihood studied in Section 7. The most clear and rigorous statistical context 

^^Syy' represents the Kronecker symbol. 

Throughout the paper, we will systematically denote the values of A^:, taken from the literature 
by A |o or A The possible ambiguities in the interpretation of these numbers will be discussed case by 
case. 

^®In the following, to adopt compact notation, we will omit the cy, Cf arguments of the likelihood function 
when no ambiguity is possible. 
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for the marginalisation procedure is arguably the one of Bayesian statistics. In particular, 
the nuisance parameters are treated on the same ground as the variables of interest and 
are thus automatically given a probability distribution (see for instance Ref. [45]). For that 
reason we focus in this section on the error combinations within the Bayesian context. The 
resulting likelihood involving the combined errors will be formally treated within both the 
Bayesian and frequentist marginalisations in Section 7. 

As we have described in Section 2.2.1, the Bayesian marginalisation procedure elimi¬ 
nates the dependence of the likelihood on the nuisance parameters through an integration. 
For the Higgs likelihood Eq. (5.11), this integration reads 



where vro is the joint prior of all the nuisance parameters. Recall that this prior factorises 
when parameters are independent. More explicitly, this marginal likelihood reads 



The theoretical uncertainties 6^on each signal strength fii are expressed in terms of the 



In the following subsections, starting from Eq. (6.2), we will combine all the sources 
of uncertainty step-by-step, following the combination formalism established in Section 3. 
The aim of this section is to provide a clear and exhaustive treatment of all the Higgs 
theoretical uncertainties. 

6.1 Combining the PDF and Ug uncertainties 

Let us first discuss the errors on QCD predictions for the Higgs production cross sections at 
the proton level. Those are induced by the uncertainties on the parton Probability Density 
Eunctions (PDE) inside the proton. Pirst, one may distinguish between two distinct origins 
to the PDE uncertainties: an experimental source - as the PDE are reconstructed from 
collider data - and the choice of a specific PDE set (MSTW, CT/CTEQ, NNPDE...). 
Second, we consider simultaneously the parametric uncertainty coming from the strong 
coupling constant, Ug- We consider both PDE and ag uncertainties simultaneously be¬ 
cause they contribute in an intricate way to the cross section, as ag enters both in the hard 
process matrix element and the PDE themselves. 

• Modeling the uncertainties: 

The uncertainties from ag and the collider data are modeled by the nuisance parame¬ 
ters 5““, and constitute independent sources of uncertainty (hence with factorisable 
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priors). The relative uncertainties on as and the PDF data can be parametrised as 

a,(l + , data(l + jdata^data^ ^ 

The Us error enters in the cross section in two different ways. On one hand, Ug is used in 
the fit of the data aimed at determining the PDF themselves. On the other hand, as is also 
involved in the hard subprocess that is convoluted with the PDF to obtain the final cross 
section. These two contributions to the cross section uncertainty, named here as A"®’®* and 
^as,hard^ are not available in the literature. However, we will show that the knowledge of 
these two separate contributions is not necessary either. Rather, provided that the relative 
errors and are small enough to be linearised, only the sum + A"®’®* 

is needed. This sum can typically be inferred from the literature. 

In order to understand the interplay among the as and the data uncertainties, it is 
instructive to write explicitly how they enter into the cross section. One should start with 
the form 

o'f^[/pDF[as,data],as], (6.4) 

where the first argument corresponds to the PDF input, while the second argument re¬ 
presents the a^-dependence coming from the partonic process. From this general form, one 
then introduces the 5°^“ and nuisance parameters, and expand the expression at first 
order, 

^^f^[/pDF[a.(l + <5“^A“0,data(l + a,(l + (5“^A“0] = 

[/pdf [as, data], a*] 5"" (5i/pdf A“") -|- 5'^'"*^(52 /pdf a'^'"*^) 

+ 5“^(9i/pdf024^ A“^) + 0(A2)^ . 

(6.5) 

The terms in the last two lines represent the errors propagated to the cross section at first 
order in A, expressed as partial derivatives of cr^, and correspond precisely to the relative 
errors on the cross section, 

(5“-A""’®* ^data^^ata ^a,^a«,hard _ ^g_g^ 

It appears clearly that only the sum is needed. Fortunately, this is what is 

provided in the literature. This sum A^'* = can be read for example from 

Ref. [20]. Note also that the nuisance parameter is common to any production mode, 
i.e. it does not carry the index X. In contrast, the nuisance parameter 5'^^^ carries an 

The 9 i, 2 represents derivative with respect to the first and second argument of the function respectively, 
dif = df{x,y)/dx, ^2/ = df{x,y)/dy. 

^®Note that the A’s in Eq. (6.6) can be negative as they are identified from the partial derivatives in 
Eq. (6.5). In the rest of the paper however, the A’s are taken positive by convention. Different signs for 
the A’s would correspond to a negative correlation, that is instead included at the level of the S’s in the 
rest of the paper. 
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index X because each production mode potentially involves different initial states. These 
initial states correspond to different PDF, which are fitted from different data sets. 

Finally, one should check the validity of the error propagation at linear order in the 
cross sections (i.e. that the O(A^) in Eq. (6.5) is well negligible). From Eq. (6.5)-(6.6), 
one can see that at linear order, for any fixed value of as {i.e. fixed value of <5“®), the error 
bar on induced by the data uncertainty (obtained from varying ^ ^ in [-1,1]) 

should have the same size. A change with as of this bar size could thus come only from 
higher order terms such like 

(^^^s/pDF dlof^ A'^’^*'^). 

On the Eig. (57)-(58)-(59) of Ref. [20] for the various Higgs production reactions at the 
8 TeV LHC, we see that the change of this bar size (vertical bar there) is small with 
respect to the shift (i.e. A^®) of the bar central values. We conclude that one can restrict 
the expansion Eq. (6.5) to linear order in a good approximation. 

Notice that a customary way to write these uncertainties is by splitting between 
the overall PDE error and the hard subprocess error, -|- 5“'* with 

. The trouble when using this form is that the and 
ijas,hard contributions are correlated via as- Combining these uncertainties then requires 
to know such a correlation coefficient, which is fixed by A"®’®*, as well as We em¬ 

phasize that the use of this intermediate parametrisation brings unnecessary complications, 
and we recommend thus to avoid it. 

Hence according to Eq. (6.6), the parametric uncertainties from as are cast into a 
single error A^"*, and add up with the statistical error from the data as 

_ (6.7) 

Using this approach, one deals directly with the elementary sources of uncertainty. These 
two sources of error have no intrinsic relation and are thus independent, meaning that 
and 6 °‘‘‘ have factorisable priors. 

Similarly, the uncertainty from the choice of a specific PDF set, modeled by 
can be added up linearly to the errors of Eq. (6.7) in a good approximation. The linear 
approximation can be justified from Eig. (57) in Ref. [20]. There one can see that the size 
of the data error bars as well as the shifts induced by as depend only weakly on the PDF 
set choice. The <5®®* error is also independent from the errors and in turn possesses 

its own prior distribution. All those errors induce three terms in the sum of theoretical 
errors entering Eq. (5.9). These terms can be cast into a global PDF uncertainty, 

^PDF+a^APDF+a, ^ ^ ^data^^ata ^ , (6.8) 

We recall that X = {ggF, VBF, VH, ttH} and that the A’s are relative errors, which are 
chosen by convention to correspond to one standard deviation. Those are related to the 
la absolute errors on the SM Higgs cross section through e.g. 

A data 

“ _SM ■ 
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• Combining the three uncertainties: 

Here we combine the three sources of theoretical uncertainty described in Eq. (6.8). We 
will add up more and more errors progressively in the following subsections. These three 
independent sources of error are associated with three priors 7r“®, tt®®*. These nui¬ 

sance parameters appear in Eq. (6.2), where they are integrated over. We now proceed 
to combine these errors following the analysis of Section 3, starting from Eq. (3.2). In 
practice, for the discussion, it will be convenient to combine only two errors at a time. 
One then finds a likelihood of the type (6.2) depending only on the nuisance parameter 
The distribution of this nuisance parameter comes with a Icr width 

given by 


(^PDF+a,)2 ^ ^ (^data^2 ^ _ (g g) 

The nuisance parameter obeys a new prior obtained via two successive 

convolutions of the initial priors (as in Eq. (3.3)-(3.4)-(3.5)), 


-PDF+a, 


-set 


data 


TT^ * TT^ * > 


( 6 . 10 ) 


where 7r™^"’~“°(x) = and the variable x corresponds to the relative 

error For the initial priors one has for example Tt^{x) = 7r"®(x/A^'’). 

The Eq. (6.9) and then (6.10) are justified in details in the rest of this subsection. 


• Details on the data and as error combinations: 

We emphasize that the Bayesian combination of the la widths, as here in Eq. (6.9), is in¬ 
dependent of the shapes of the prior distributions. This combination only depends on the 
possible correlations among individual errors [c./. Section 3.2]. In the present case, there 
is no correlation between the and 5'^ parameters, as explained right below Eq. (6.7). 
This leads to the sum in quadrature of the Icr errors (A^*®')^ -|- (A^®)^ in Eq. (6.9). 

Let us comment about those uncertainties. First, the error associated to originates 

mainly from measurements: it is mainly induced by the limited accuracy of data points 
used to perform the fit for reconstructing PDF. Hence this error is mostly of statistical 
nature. There exists of course systematic errors as well, but it has been checked by several 
groups that the final distribution can be reasonably taken as Gaussian [18]. 

Second, the uncertainty on ag originates mainly from lattice calculation errors (mainly 
theoretical) and especially from perturbative truncation errors [46] Indeed the Ug de¬ 
termination from lattice methods (most accurate one in Ref. [46] ) represents today the most 
precise determination and hence essentially dictates the final world average error [47]. The 
FLAG Working Group on lattice calculations has estimated a more conservative uncer¬ 
tainty on ag, which is increased by a new QGD perturbative error estimation [48], thus 
still leading to a dominant theoretical uncertainty. 

At this level, a comment is needed on the link between the Icr errors and the uncertainty 

^®The only source of experimental error is, and is minor - as can be read from the Table IV of 

Ref. [46]. 
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magnitudes provided in literature. To remain conservative we use = A^^Iq for the 
la error, where A^^Iq is the error provided by Ref. [17, 20]. There is indeed a somewhat 
arbitrary choice for the relation between AJ'* and Aj^jo, due to the theoretical (QCD) 
nature of the uncertainty. The origin of this arbitrariness is the fact that the QCD errors 
are just estimated by varying the renormalisation and factorisation scales on arbitrary in¬ 
tervals. We present a similar discussion in the beginning of next Section (6.2) for 
Concerning the Icr error from data, one can adopt A^*®' = A^*®'|o (A[^*®'|o being read 
from Ref. [17, 20]). Indeed, the probability distribution for the uncertainty induced by the 
experimental data can be safely described by a Gaussian, as described above, so that the 
errors provided by Ref. [17, 20] can reasonably be interpreted as Icr errors. 

Let us now discuss the convolution between and that appears in Eq. (6.10). 
For that purpose, we first need to discuss the form of the distribution. The shape 
of can be taken as flat since the uncertainty on a* originates mainly from theoret¬ 
ical uncertainty, as mentioned above. However, the choice of the prior for a theoretical 
uncertainty is often controversial, so that we will also consider the case of a non-flat 7r“® 
distribution. 

Finally, the convolution of the Gaussian prior, with a flat prior, gives rise to a 

Gaussian distribution, * tt^'’ , in a good approximation for the various Higgs produc¬ 
tion modes. The justification is that the width, A^", is systematically smaller or of 
the same order as in which case the convolution leads to an almost pure Gaussian 

prior. This will be demonstrated explicitly in Fig. (2) for other priors. 

• Details on the combination with the PDF set error; 

The various PDF estimations provided by the different fitting groups reflect several sources 
of error [49-51]. Indeed, these groups make different choices/hypotheses about the num¬ 
bers of free parameters used to model the PDF the statistical methods adopted to 
fit the data the number of independently parameterized PDF (in particular regarding 
(anti-) strangeness), the collider results exploited, the matching methods applied to include 
heavy-quark mass effects in the flavour number scheme and the variable- or hxed-flavour 
number scheme. All these sources of uncertainty are synthesized in the Icr error on the 
Higgs production rates noted Aj^*. To remain conservative, we assume A^* = A3^*|o, where 
A^*|o is the error read from Fig. (57)-(59) of Ref. [20]. A^*|o can be estimated by taking 

be consistent throughout the paper, concerning the initial priors, we will assume a flat shape for 
the distributions whose shape is unknown (uncertainties from QCD, parametrisation...). 

^®For the ggF example, our conservative treatment of the errors provided in Fig. (59) of Ref. [20] gives 
an half absolute width, lF/2 A ^/3Aa'^‘’ = -\/3A(t^'’|o — 0.5 pb, which is indeed comparable to, = 

— 0.5 pb. In the alternative case (see the analogous discussion at the start of Section 6.2), one 
has instead, VF/2 A ^/3Aa'^^ = — 0.3 pb, which is clearly smaller than, Aax^*^ ~ 0.5 pb, so that 

the Gaussian approximation for the final convolution would be even better because this case would tend to 
a situation where the non-Gaussian error becomes negligible. 

^®The infinite-dimensional problem of representing a space of functions is reduced to a finite-dimensional 
form, in order to be manageable, by introducing a parametrisation of the PDF. 

There exist mainly two classes of methodology currently used to determine a confidence interval re¬ 
presented in the space of functions: some variations of the Hessian approach (multi-Gaussian probability 
distributions) and the Monte Carlo approach. Both types of methods have their own limitations. 
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half the interval obtained by using the various PDF sets which lead to a finite number 
of predictions for the Higgs rate central values. Of course, this determination of 
is probably underestimated as (i) the hypotheses made by the groups provide illustrative 
examples which do not necessarily indicate the extremal values of the PDF, and, (ii) the 
effects of the various sources of error listed above can potentially compensate each other. 
We comment on this point in the following paragraph. 

In Eq. (6.9), the sum in quadrature between the error and the data and as errors 
is justified because these are independent uncertainties. Nevertheless, in practice, for our 
numerical applications, we use the so-called envelope method to determine 
as done in Ref. [20, 52] and calculated by the LHCHWG [17]. Note that the envelope 
method overestimates the combined errors, compensating somehow for the underestima¬ 
tion of the PDF set error. For the ggF mechanism, the error derived in this 

way has to be reduced by ~ 40% to recover the quadrature summation of Eq. (6.9), and 
the decrease is smaller for the other Higgs production reactions. Hence, we conclude that 
the use of the envelope method to determine the global PDE uncertainties gives rise to a 
substantial overestimation of these errors. 

We finally discuss the shape of the prior of the final combination . Most of the 

sources of error taken into account in A^* are of theoretical nature and all the errors have 
unknown distributions. The shape of 7r3^* is therefore assumed to be flat. The convolution 
of 7r3^* (see Eq. (6.10)) with the nearly Gaussian distribution leads in a good 

approximation to a final Gaussian prior, Once more, this is guaranteed by the 

fact that for any Higgs production mode at the LHG, A^^* is smaller or comparable to the 
combination of A^*®" and A^® (see for instance Ref. [20]). 

6.2 Scale and EFT errors: the amplitude uncertainties 

• Scale error: 

There exists another major type of error, this time at the parton level, on the QGD pre¬ 
diction for Higgs production cross sections. It originates from the lack of knowledge on 
the higher order contributions to the amplitude in the perturbative expansion, and can be 
recast into the dependence on the QGD renormalisation and factorisation scales. We note 
^^aie nuisance parameter representing this “scale uncertainty”. 

There are no strong arguments to choose the shape for 7r3^®'^®. As for many other theoretical 
uncertainties, the choice of the prior is typically a subject of controversy. Here we choose 
^scaie Concerning the magnitude of the scale uncertainty it is also not 

^^This “envelope method” corresponds precisely to the uncertainty combinations in the bias approach, 
see Section 3.5. What we call envelope method in the present paper is rather described in Section 2.2.2. 

^^In the envelope method used in this reference, the whole uncertainty interval is found by searching at 
the minimum and maximum rates (considering the various PDF sets, as values and including the possibility 
to move along the data-error bars). Then dividing by two this interval gives an estimation of the combined 
error as well as a central value for the rate. 

Given that there are several sources of errors contained in the PDF set uncertainty, one may expect 
the prior to be somehow peaked. This feature improves even more the Gaussian approximation of 

PDF+Qg 

TTx 
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clear to which width exactly corresponds the provided value, noted here, that is found 
in Ref. [17, 18, 25]. It is reasonable to expect to be of order A^. To be more precise, 

we could make the two different assumptions, or A^=W/2 where W is defined 

as the support of the distribution, with e.g. in the case of a flat distribution on an 
interval with size W: 2A3^®'^®=W/\/3 = 2A^/\/3. In order to be conservative in the choice 
of we choose the former hypothesis throughout this paper: = A^. 

It is remarkable that recently [25], the calculation for the ggF mechanism has been pushed 
up to the complete N^LO order in perturbative QCD. This has allowed a reduction of 
the symmetrized scale error from Ag^p ~ 7.51% (with the renormalisation/factorisation 
scale fiQ = mnl^ to absorb some of the soft-gluon resummation corrections [53]) [17, 18], 
down to Aggp ~ 4.16% (with [25]. The error was obtained in both cases by 

spanning the interval [/ro/2, 2/io], for the renormalisation/factorisation scale /r = /tr = /ip, 
at an energy ^/s = 8 TeV and for rriH ~ 125.2 GeV. 



Figure 1: Probability density distribution, 7'‘g^’’(a:/Ag^) (in red), involving the relative error x 
(in %) of the ggF cross section, as derived through the convolution of the and priors 

(both in blue). The quantity A^^ represents the relative Ict error on the Higgs production rate 
(see text). For better comparison, the normalisation is chosen such that all the functions possess 
the same maximum, equal to unity at the origin. 


• EFT error: 

In the specific case of the ggF mechanism, another source of error arises in the amplitude of 
the Higgs production [54], that we describe now. The evaluation of this amplitude beyond 
the NLO level is possible within the Effective Field Theory (EFT) approach, where the 
particles running in the triangle loop are assumed to be much heavier than the produced 
Higgs boson to integrate out the heavy particles. 

For the top quark exchange, the infinite mass assumption, mt 3> mn, induces a negligible 

Recall that the support of a distribution is the domain where this distribution is not zero-valued. 
^^Symmetrized over the positive and negative errors as, A = [(A+ -|- A?.)/2]'^^^. 

^®Choosing instead, /ro = mi//2, could be motivated by a faster convergence of the perturbative series [25]. 
However, since it would lead to a significantly smaller uncertainty, Aggp ~ 2.13%, we stick to the central 
choice, /ro = mn, in order to remain conservative. 
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error on the ggF amplitude [27, 55]. In contrast, the EFT approach is clearly not valid for 
the other significant ggF contribution: the bottom quark exchange [25] . This inappropriate 
use of the EFT limit introduces some non-negligible error mainly through the interference 
between the bottom and dominant top quark loops (this error being smaller at the Teva- 
tron than at the LHC) [56]. 

A similar uncertainty originates from the mixed QCD-EW corrections to the ggF pro¬ 
cess [27]. Those have been calculated at NNLO via the EFT approach based on the 
simplifying but unrealistic assumption, Mw,z ^ For all the EFT errors, some ap¬ 

proximative estimations can be computed at NNLO (using iL-factors obtained at NLO and 
NNLO for the top loop) [26, 55]. 

A related uncertainty comes from the freedom in the choice of a renormalisation scheme for 
the bottom quark mass, involved in the ggF amplitude (on-shell scheme, MS scheme...). 
The error from the renormalisation scheme dependence can be approximately estimated at 
NLO [55]. 

These three sources of theoretical uncertainty, namely the two kinds of EFT assumptions 
(on the heavy quark masses, mq [Q = 6, t), and vector boson masses, My (V = W^Z)) 
and the mb scheme dependence, are independent and their respective priors are unknown. 
We assume these priors to be flat. To be conservative, we take the three la errors to be 
equal to the numbers estimated in Ref. [26, 55], for the 8 TeV LHC. Summing those in 
quadrature gives rise to the relative rate error, — 5-6%- The convolu¬ 

tion of the three flat priors (accordingly to Eq. (3.5)) leads to the blue distribution, Tr^’p , 
shown in Eig. (1), which already resembles a Gaussian shape as predicted by the central 
limit theorem. 


• Combining the and errors: 

The theoretical scale and EFT uncertainties on the ggF mechanism are of different nature 
and are thus independent. The combined ggF la error is in turn given by 


. ( 6 . 11 ) 

This error constitutes the characteristic width of the distribution obtained by con- 
voluting the Ttggp® and priors, as performed in Fig. (1) (see the final red curve). 

Remarkably, this distribution. 


_^amp _ ^sciUe ^ 


VT, 


= vr, 


ggF 


ggF 


( 6 . 12 ) 


derived from four purely flat priors, is Gaussian in a good approximation. This can be 
also seen in Fig. (3) where Tg^^ is plotted together with a pure Gaussian distribution 
(blue curves). Recall that 7rg^*’(a:) = and the variable x corresponds to 


ramp A amp 
^ggF ^ggF • 


6.3 Combination of the PDF and amplitude errors 

For the various Higgs production modes - except the ggF process that will be discussed 
separately below, one has to combine the PDF and scale errors to determine the final 
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uncertainty on the whole cross section. The scale error adds up to the PDF error of 
Eq. (6.8), according to Eq. (5.9), defining the total uncertainty on the cross section, 

. (6.13) 


These errors being independent, the \a widths add-up in quadrature. 


= (AP°^+“=)2 + (Af'^i®)2 , (6.14) 

as dictated by Section 3.2, he. irrespective of the and shapes. Recall that 

A^ is the Icr width of the resulting vf^ distribution. The prior of this total uncertainty 
is then given by (see Eq. (3.5)) 




-PDF+as , -scale 

> 


(6.15) 


with vfj(x) = and x corresponding to (5^A^. 

Let us discuss the form of the vr^ function, as generated through Eq. (6.15). The 
shape of being unknown, we assume a flat distribution. Remind that this error 
is simply obtained by varying the QCD scale, so that no favoured value is predicted for 
the cross section. It is therefore a sensible choice to assign equal probabilities to all the 
values of (or equivalently of the Higgs cross section) inside a certain range. On the 
other hand, we have seen in Section 6.1 that is approximatively Gaussian. Given 

the relative values of A™^''~“® and for each process X - which are systematically 

such that either or - a Gaussian and a 

flat lead in a good approximation to a final Gaussian vr^. This combination is shown 

in Fig. (2) for ZH production, for which ~ 2.5% and A^g'® = A^jj ~ 3.1% (at 

-y/s = 8 TeV with mu ~ 125.2 GeV) [17]. 


• The ggF reaction: 

In the case of Higgs production via the ggF mechanism, the PDF error has to be combined 
with the whole amplitude error studied previously in Section 6.2. The resulting total error 
on the cross section is 


rcr ACT _ rPDF+Os A PDF+Cfs 

^ggF^ggF ~ "ggF ^ggF 


I ramp A amp 

+ OggF ^ggF • 


These two errors being independent, their widths add-up in quadrature, 



and their priors are convoluted following 


(6.16) 


(6.17) 


‘ggF 


= -PDF+Os , =amp 

- ^ggF ^’^ggF 


(6.18) 


This convolution (6.18) is performed in Fig. (3), using the distribution obtained in 
Fig. (1) and the value ~ 7.20% (at ^/s = 8 TeV with mu — 125.2 GeV) [17]. Both 

priors "Tg^^, being nearly Gaussian, the final distribution is almost Gaussian. 


^^whatever is the prescription: or = A‘^/\/3. 

^®Recall the convolution of two Gaussian distributions gives rise to a Gaussian distribution. 
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Figure 2: Probability density distribution, I'^d), involving the relative error x (in 

%) of the ZH production cross section, as derived through the convolution of a Gaussian 
and a flat tt^h® priors (both in blue). The quantity, A^jj, represents the relative Icr error on the 
Higgs production rate. The normalisation is chosen such that all the functions possess the same 
maximum, equal to unity at the origin. The Icr band for the tt^h® distribution is indicated by the 
vertical dotted lines. 



Figure 3: Probability density distribution, 7’‘ggp(a;/Aggp) (in red), involving the relative error x 
(in %) of the ggF cross section, as derived through the convolution of a Gaussian prior 

and the distribution obtained in Fig. (1) (both in blue). The quantity, A^^p, represents the 
relative Icr error for the ggF rate. 

6.4 The production contamination 

There are several production mechanisms for the Higgs boson (recall that X = {ggF, VBF, 
WH, ZH, ttH}). The cross section for each of these production modes is associated with a 
theoretical uncertainty, that has been obtained through subsections 6.1 to 6.3. In fact, one 
may note that the uncertainties of these various cross sections are potentially correlated, 
as they partly arise from common sources like the ag parametric error. Therefore the 
follow a common distribution vr”', which does not necessarily factorise into TTggpTTygp x ... 
The aspect of correlations among the cross section errors will be further discussed in Sec¬ 
tion 7.1. Here we shall proceed using the most general prior vr®", and we denote the resulting 
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correlation matrix as Pxx" 

The contribution from the cross sections errors in a given detection channel can be 
read from Eq. (5.8). Let us first adopt a more compact notation, 




'^6x^x,i, 
X 


(6.19) 


where the 6x^x defined in Eqs. (6.13), (6.16). The Higgs detection channels have 
been designed to select predominantly a certain mode of production. That is, for a given 
channel i, the experimental cuts are profiled so that typically the efficiency for one 
of the production modes X (see Eq. (4.6)) is much larger than for the others, implying 
a hierarchy among the Ax.i- We can therefore use the leading moment approximation, 
developed in Section 3 and Appendix A, to proceed to the combination of the errors. 
Applying the leading moment approximation amounts to treat the contaminations as a 
small perturbation of the uncertainty from the leading production mode. The cross section 
uncertainties propagate in a given detection channel as {P stands for production) 


^Xi^f — <J|gFAggF,i + 5vBFAvBF,i + + <5wHAwH,i + • (6.20) 

Here the label of the combined nuisance parameter 5x^ is chosen to be the label of the 
dominant production mode in the i channel. Note that Xi should be understood as X{i). 
This naming refers to the fact that the shape of the combined nuisance parameter prior 
corresponds approximatively to the shape for the dominant uncertainty, see Eq. (3.14). For 
example, if the production mode ggF dominates in the channel i, one has 


4 — • ( 6 - 21 ) 

The various nuisance parameters 5x are potentially correlated. They should thus follow a 
joint prior distribution, tt^, generating a correlation matrix Pxx'- 

Assuming generic correlations Pxx' among the various cross section errors, the ma¬ 
gnitude of the combined production uncertainty in a channel i is given exactly by 

(Af )^ = ^ Pxx'^x,i^x',i • (6.22) 

XX' 

The leading moment approximation then dictates (see Eqs. (3.16)-(3.19)) that 

vr^ PS tt'" . (6.23) 


Equation (6.23) implies that the correlations among the 5^ are approximatively the same 
as the ones between the 5^, i.e. 

Pxx' ~ Pxx' ■ (6-24) 

This fact can be understood as follows. Consider only two detection channels, i and j. 
If the same production mode X = Xi = Xj dominates in both channels, they are nearly 

^®In Section 7.1, the assumptions adopted for Pxx' "'ih allow us to express P in terms of the ttx- 
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100% correlated, so that they are described by a single nuisance parameter 6x, which 
is equivalent to say that ^ 1. Note that one has = 1 by definition, so that 
Pxx ~ Pxx- Besides, if two different production modes Xi ^ Xj dominate respectively in 
the i and j channels, the uncertainties in both channels are respectively described by 5^, 
and 6^ . These two nuisance parameters inherit the correlation from the leading production 
modes Xi and Xj, which is given by Pxx- Therefore one recovers Eq. (6.24). 

Finally, notice that for certain kinematical cuts selecting the ttH mode in the diphoton 
decay channel [29], even additional production modes can slightly contribute, like the bbH, 
tHW and tHbq productions. These production modes participate in the contamination 
and have thus been included in the combination of production modes in Eq. (6.20). 


6.5 The uncertainties on branching ratios 


Two sources of error affect the Higgs signal strengths: the production and the decay 
rate uncertainties (see Eq. (4.6)). The latter is often not considered in the Higgs fits. 
Still following our approach of step-by-step combinations, one should start with the signal 
strength error Eq. (5.8), where all uncertainties on production modes have been already 
combined (Eq. (6.20)). The uncertainties on production and decay rates combine thus as, 
up to an irrelevant global sign, 


with = 


AB. 


SM 


^SM 


-nSM 

rSM _ Yj 

-p 

J. tot 


(6.25) 


where Ty^ is the SM partial decay width for the detection channel i. In this equation, 
we apply the leading moment approximation to treat the branching ratios errors as per¬ 
turbations of the leading error from production modes. This is why the 5^- parameters 
carry the index Xi, which is the index of the dominant production mode in the channel i, 
as in the previous subsection. For example, if the production mode ggF dominates in the 
channel i, one has 

^x, = '^ggF • ( 6 - 26 ) 

The relative error by. Ay. on the SM branching ratio is expressed as in Eq. (5.7), where 
the decay width uncertainty (5.10) can now be specified in terms of the various sources of 
error (c./. Section 3 of Ref. [55] for a recent overview, and references therein). 




+ 4h"A*yh" where e.ff. A^y^" 


^^thu 

Tfr 


(6.27) 


The partial decay width errors are taken from the LHCHWG [17, 18, 20]. The 

ATy “ denote the theoretical uncertainties due to the limitations of QCD perturbative cal¬ 
culations. The Ary'^“ represent the parametric uncertainties induced by the experimental 
errors on the input parameters, labelled by a = as,mc,mb,mt (charm, bottom and top 
quark masses). Typically, one has >> since the QCD corrections 

to the h —)• VV*, Tf decay channels arise only at orders higher or equal to 0(ag). 

The ATy’^'* errors are associated to Gaussian distributions, and are thus identihed 
without ambiguity with the errors defined in Ref. [20]. The ATy^ errors are purely theo¬ 
retical, so that one associates them with flat priors. To adopt a conservative prescription. 
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as in Section 6.3, we interpret the numbers given in [17] as Icr-widths. These numbers are 
thus directly identified with the ATy^. 

Now inserting Eq. (6.27) into Eq. (5.7) provides the contributions of the theoretical 
and parametric uncertainties to the branching ratios, 



= + , (6.28) 
Y,a Y 


where in the last line one introduces a compact notation for the error magnitudes. The 
sum over Y here must include all the individual Higgs decay channels (not only the ones 
effectively detected at colliders), namely Y = bb, cc, WW, ZZ, rr, 77 , gg ... 

We stress that the parametric error on various decay rates Y arises from the 

same source (namely, varying the fundamental parameter a). The parametric errors on the 
various decays are thus fully correlated. Therefore, one could in principle drop the Y index 
on There is however a subtlety, because these errors can be either 100% correlated 

or 100% anti-correlated. The use of parameters would render the full correlation 
manifest, but minus signs would have to be included in certain Ay'^'". Here instead, we 
chose positive A’s by convention. We have thus to keep the Y index on <5y'^“, bearing in 
mind that this Y labels only 100% correlation or anti-correlation. A second subtlety is 
that these signs are actually not clearly given in the literature. Rather, only the absolute 
values of the Ay““|o are provided. We adopt a conservative choice by assuming that all 
these errors are 100 % correlated. 

We can now apply the leading moment approximation on the combination of Eqs. (6.25)- 
(6.28), where the leading uncertainty is 5;^,Af’ and the perturbation is Jy.Ay^ i.e. Af » 
Ay^i, Ay-. The l(T-width of the global theoretical uncertainty in a channel i is given by 


(Af)2 = (Af)2 + J^ 


E 

L Y 


A“ 

^Y,i 




Y 


(6.29) 


with Af given by Eq. (6.22). Regarding the prior distribution of the the discussion 
is exactly the same as the one in Section 6.4. That is, following the leading moment 
approximation, the joint distribution of the corresponds to the one of the leading 
uncertainties Sf, so that 

PS TT'^ . (6.30) 

This implies in particular that the 6^ inherit the correlations from the df, that is Pxx' ~ 
Pxx'- 

Let us discuss the correlations used to derive Eq. (6.29), which are drawn from Ref. [17, 
18, 20]. First, a given parametric uncertainty associated to dy'^'" introduces 100% correlated 
errors among the various decay modes Y, so that the sum over Y of the Ay^ is linear. Recall 
the parametric correlations are taken to be all positive. There is also a slight correlation 
between bf,Af and A'f’-, because also contains a contribution from the error. 


35 




The as contribution being subleading in its correlation with is expected to be 

small, so that we can neglect it. All the other sources of uncertainties are independent 
due to their different origins, so that summations in quadrature appear everywhere else in 
Eq. (6.29). 

Using the definitions of the reduced A’s in Eq. (6.28), we hnally write explicitly the total 
theoretical uncertainty on the signal strength of a Higgs detection channel i. 


(Af)2 = (Af)2 + ^ ^AP"“(HSM_<5y^y) 


Y 


Y 


(6.31) 


6.6 Summary 

In this section we have assembled step by step all the theoretical uncertainties on the Higgs 
signal strengths, starting from the Higgs likelihood Eq. (6.2). This combination is made 
possible by the statistical analysis of Section 3, whose results have been extensively used 
here. The final Higgs likelihood involving the combined errors reads 


L(cv,cf) = I 


(6.32) 


exp 




E c/l - A.r(l + Af) C“-' p.f\cv, c;l - M?(l + -sy Ap 


The only label for the combined nuisance parameters 5^, is Aj, the dominant production 
mode for a given channel i (see for instance Eq. (6.26)). The prior is approximately equal 
to the prior of the production mode uncertainties tt”", through Eq. (6.23) and Eq. (6.30). 
In Section 7.1, the assumptions on the correlations among the production modes will allow 
us to express tt®" in terms of the priors of individual production mode uncertainties (see 
Eqs. (3.16)-(3.19)). 

One of the outcome of the combination procedure followed throughout this section is 
that the shape of the combined priors vr^ appears to be almost Gaussian. This comes 
partly because some of the priors for the individual sources of uncertainty are Gaussian. 
However, the main reason is actually that a substantial number of the individual sources 
of uncertainty are independent and of same order of magnitude. These conditions resem¬ 
ble to the ones of the central limit theorem, which predicts that the combination would 
converge towards a Gaussian distribution. Besides, the small errors from contamination 
and partial decay widths do not affect either the final prior shape under the leading mo¬ 
ment approximation. It follows that the vr^ distribution is close to a multivariate Gaussian 
distribution. 

Finally, we stress again that the famous question of the linear versus quadratic sum¬ 
mation of individual errors (as the ones used in this section to derive Af in Eq. (6.31)) 
relies uniquely on the correlations among the errors, and is therefore independent of the 
shapes of the priors. This general feature holds when uncertainties are combined using 
Bayesian statistics. 
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7 Marginalising the Higgs likelihood 
7.1 Correlations of the detection channels 

In this subsection we focus on the correlations among Higgs detection channels induced by 
the theoretical uncertainties. These correlations appear whenever a source of uncertainty 
contributes simultaneously to various channels. 

As a preliminary observation, let us recall that these correlations are sometimes not 
taken into account in the literature. What is typically done in such case is that some 
amount of error, typically from Refs. [17-20], is added independently to the statistical 
error of each detection channel. Such combination typically reads if 

done in quadrature. From the point of view of nuisance parameters, this combination 
would correspond to associating one independent (^fA^ to each detection channel, and 
thus performing one integration per channel in the marginal likelihood. 

The issue with such approach is that the correlations among channels induced by the 
theoretical uncertainties are lost. As stated in Section 2.2.2, these correlations are crucial 
because they potentially change the tension among the various channel measurements, 
which in turn can modify the best-fit regions. As slight modifications of the best-fit regions 
are expected in presence of new physics, treating correctly the theoretical uncertainties is 
fundamental. 

Taking into account the correlations among channels amounts to consistently propa¬ 
gate the theoretical errors into the different detection channels. This is precisely what is 
done through the combination procedure of Section 6. Combining the errors together and 
using the leading moment approximation to treat subdominant errors, only five nuisance 
parameters '^wh ^^^ise (see Eqs. (6.31)-(6.32)). The uncertainty 

on each channel is described by only one of these <5^, where the X corresponds to the 
dominant production mode in this channel. That is, all channels dominated by the same 
production mode X have the same nuisance parameter 6^. This implies that these channels 
are 100% correlated. 

In principle, the combination procedure of Section 6 describes the complete distribu¬ 
tion for the (5^, vr^, including the correlations Pxx' the different 5^. In practice, 

a complete knowledge of the correlations among the individual sources of uncertainties is 
needed to obtain Pxx'- Here we consider the determination of Pxx' beyond the scope of 
this paper, since for example one would have to work out clearly the correlations among the 
Higgs production modes induced by the PDF data uncertainties Using the infor¬ 

mation available in the literature we will rather consider some characteristic cases for Pxx' ■ 

Let us first discuss the typical correlations induced by the PDF uncertainties (origi¬ 
nating from the PDF data fit) and the scale uncertainties (c./. Section 6.2) on the produc¬ 
tion cross sections. From now on, the 6x are denoted as 5x for simplicity, 

= (7.1) 

First, we will set dggp = —(JttH since an anti-correlation between the corresponding PDF 
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errors is reported in Ref. [57] Note that in reality, this anti-correlation is not total (its 
value is -0.6 in Ref. [57]) and furthermore the other source of error, the scale uncertainty, 
does not correlate the ggF and ttH cross sections as these come from independent QCD 
calculations. 

The correlation coefficients of the PDF errors - between ~ 0.63 and 0.93 [57] - for the 
three other production modes motivate us to take (5 vbf = <^ZH = <5wh- This assumption is 
further justified by the fact that the PDF error is larger than the scale error (particularly 
for VBF) and that the scale error most probably correlates the ZH and WH modes. 

The correlation coefficients of the PDF errors between ggF and WH (-0.23), ZH (-0.14) 
or VBF (-0.57) suggest to consider the two extreme cases of vanishing correlation and 
100% anti-correlation. The scale uncertainties tend to decorrelate these modes. It is thus 
coherent to consider the cases of vanishing correlation and 100% anti-correlation as the 
two extreme cases to study. All these assumptions are summarized as the two following 
conhgurations on the nuisance parameters, 

<5ggF = —<5ttH , ^VBF = = 5 wH , (7.2) 

-dggF = (5ttH = <5vBF = (^ZH = <5wh , (7.3) 

keeping in mind that the realistic situation lies in between these extreme cases. 

Regarding the PDF set error, the individual uncertainties giving rise to this error are 
not available in the literature. Rather, only the global PDF set error is estimated by 
changing various assumptions at a time. One can at least notice that the PDF set errors 
can be potentially correlated either negatively or positively, respectively, for the ggF and 
VBF reactions or the VBF and VH processes, as observed from the relative signs of rate 
variations in Fig. (57) of Ref. [20] when changing the PDF set. These correlations are 
roughly consistent with the ones in Eq. (7.3). 

Let us describe how the correlation configurations of Eq. (7.2)-(7.3) are related to the 
TT^ appearing in the marginal likelihood (6.32). The prior is approximately equal to the 

prior of the production mode uncertainties (Eq. (6.23) and Eq. (6.30)) which can itself 

be expressed (according to (3.18)-(3.19)) in terms of the vr^ under the assumptions (7.2)- 
(7.3). One ends up with the two final priors, associated respectively to the correlation 
conhgurations of Eqs. (7.2)-(7.3), 

7r^(5x) = ^ggF('^ggF) '5(<5ggF + <5ttH) ttvbfC'^vbf) 5(<5vbf — (^zh) <^(<5vbf — <^wh) , (7.4) 
= 7rggF((5ggF) <5((^ggF + (^ttn) '^('^ggF + <5 vBf) <5((^ggF + <^Zh) '5(<^ggF + <5 wh) , (7.5) 

where (i() denotes the Dirac distribution. 

is not clear from this reference whether the correlations include as well the whole error from as which 
is 100% correlated between the production modes. Nevertheless this source of error is minor compared to 
the other ones. 

^^For consistency, these two configurations are used as well to determine the p'xx' correlation matrix of 
Eq. (6.22). 

■^^Recall that the Fig. (57) of Ref. [20] is for the 8 TeV LHC. 
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7.2 The Bayesian analytical likelihood 

The vrj priors deduced from the combination of all the cross section errors, in Section 6.3, 
have been found to be nearly Gaussian distributions. These Gaussian shapes are obtained 
by choosing flat shapes for all the unknown priors for theoretical uncertainties. As men¬ 
tioned in Section 6.6, one expects this result to hold approximatively for other choices 
of initial priors. Nevertheless, in order to take into account in our numerical results the 
possibility of non-flat initial shapes, we also consider a totally different form of the final 
prior: we take it as a flat distribution. The choice of these two shapes (Gaussian and flat) 
provides an estimate of the impact of the prior shape on the final results. The distributions 
TT^ appearing in Eqs. (7.4)-(7.5) are hence defined as 

= (7,6) 


^xHx) 


l/2v/3 if G [-v/3, v/3], 
0 otherwise 


(7.7) 


for the Gaussian and flat cases respectively. Recall that the variance of all the (5’s, including 
(5^, are chosen to be equal to one for any prior shape. This appears clearly in Eq. (7.6) 
and implies the [—v/S, \/3] interval in Eq. (7.7). 

For analytical integrations of the final likelihood (6.32), it is convenient to denote 
by A a subset of fully correlated production modes, {X, X ',..We then denote by 
the subset of channels (labelled by i) dominated by the production modes contained 
in X. In presence of anti-correlations, one further divides into two anti-correlated 
subsets Finally, the set of all channels is written n. Assuming the correlations 

among production modes follow Eq. (7.2), the set of detection channels is splitted into 
^{ggF,ttH} nnd f 2 |Y 3 p-yvH.ZH}- ^{ggF,ttH} is then splitted into the anti-correlated subsets 
^{ggFttH} ~ ^ggF> ^{ggFttH} ~ Assuming the correlations of Eq. (7.3), there is 

instead a unique set Q = f2{ggF,ttH,VBF,WH,ZH}- If is splitted into the anti-correlated subsets 

^{ggF,ttH,VBF,WH,ZH} ^ ^^ggF> ^{ggF,ttH,VBF,WH,ZH} ^ II{ttH,VBF,WH,ZH} • 

At that point it is also convenient to introduce the following quantities C,x and r]xx' 
defined as 


a 

rjxx' 




1 if f G 
— 1 if f G 


(7.8) 


The overall sign of Cx is irrelevant. Note also that \i X ^ X' (as may occur in the ^xx' 
function), there are no theoretical correlations at all between the channels belonging to Qx 
and ^Ix'- 
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In the case of a Gaussian prior (Eq. (7.6)), it is noticeable that the most general likeli¬ 
hood (6.32) can be integrated analytically and results in the simple analytical expression 


T Gauss 


exp 


X ^ Cxi^xx' + 'nxx') ^Qx' 

XX' 


(7.9) 


Here 5xx' is the Kronecker symbol. is the base likelihood defined in Eq. (5.1), i.e. the 
likelihood before introducing nuisance parameters. One observes that the marginal like¬ 
lihood takes the form of a product of the base likelihood with a term generated by the 
theoretical uncertainties. This term, which depends on cy,c/ through Qx^ as well as on all 
theoretical and experimental uncertainties, implements all the deformations and correla¬ 
tions induced by the theoretical uncertainties. 

Eor the case of no experimental correlations between different group of channels of dom¬ 
inant production modes, including the case considered without experimental correlations 
at all (see Section 5.1), one has rjxx' = 0 for T 7 ^ T' and 


r]xx^Vx= (7.10) 


The marginal likelihood (7.9) then reduces to, 

^Gauss = eC%mrix+l) _ (y H) 

A" 

Note that this product is over different X subsets i.e. there are no theoretical correlations 
among the channels belonging to the different Qx groups. 

Note that if one assumes a single independent nuisance parameter per channel, there 
is no sum in Eqs. (7.8), meaning that no correlation among channels is induced. One 
can directly verify that in the purely de-correlated case (neither experimental nor theoret¬ 
ical correlations), Eq.(7.11) gives back the primary likelihood (5.1) with a summation in 
quadrature between the absolute experimental and theoretical errors, A//|^ and 

In the case of the flat prior of Eq. (7.7), there is no simple general form such as 
Eq. (7.9). However, assuming no experimental correlations among various ^Ix subsets, the 
marginal likelihood takes a simple form, 

= L^Yl 

A" 

where Erf is the standard error function. 


Erf (- Erf 

\ V2 v^j V ^ 


I _ Cx 

V‘^Vx 


, (7.12) 


7.3 The frequentist treatment 
7.3.1 The marginal likelihood 

In classical frequentist statistics, hypotheses are not associated with probabilities, so that 
there is no such thing as a prior distribution for a nuisance parameter. In the hybrid fre¬ 
quentist framework however, one can associate a parameter with a “prior” distribution that 

A similar expression can also be obtained for an arbitrary correlation matrix Pxx' ■ Note one dropped 
an overall factor, as the likelihood is defined up to a normalisation constant. 

“^^We recall that such a combination should be avoided as it is not realistic. 
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can be seen as an extra likelihood constraining the nuisance parameter. Pushing forward 
the analogy with the Bayesian case, we worked out the way to combine uncertainties within 
frequentist statistics in Section 3.3. One may hnd however that the Bayesian combination 
of uncertainties are better dehned than the frequentist one. 

More pragmatically, frequentist combinations are also more complicated, as the com¬ 
bination of the magnitude of the errors (the A’s) depends on the shape of the frequentist 
“priors”, contrary to the Bayesian case. These drawbacks can constitute motivations to 
rather follow the Bayesian approach developed in previous sections. Nevertheless, for com¬ 
pleteness we describe here the hnal part of the frequentist method for the Higgs ht. For 
that purpose we consider in the following, a generic prior, 7r^((5^), of width Af, obtained 
after a hrst phase of frequentist combination. 

Recall that the frequentist marginalisation procedure, also called prohling, consists in 
maximizing over <5^, instead of integrating as done in Eq. (6.32). Hence the frequentist 
marginal Higgs likelihood reads 


L{cv,Cf) = max 


(5^ 

"x 


TT^ 


(K) X 


(7.13) 


exp 


'/I - f‘r(l + ipflcv, C/I - f<“(l + Ap) 




As often done in practice for the frequentist treatment, one can equivalently minimize 
the distribution, = ~21ogL, instead of the maximisation in Eq. (7.13), 


-21ogi'‘(«J) + (7.14) 

E c/1 - /.f (1 + Af)) eg— (gpCK, c/l - /.“(I + Ap) , 

hj 

The best-ht point given by the minimum in the {cf,cv) parameter space is noted 
{cf,cv) and the best-ht regions are obtained by drawing contour levels of the difference 
{c.f. Section 2.1) 

cv) = cv) - X^{cf, cv) (7.15) 

at the values given by Eq. (2.8). 


X^icv,Cf) = min 




7.3.2 The frequentist analytical likelihood 


Assuming that the Bayesian and frequentist combinations of the errors lead to analogous 
shapes for the hnal priors, we consider both a Gaussian and a hat shape for each vr^ prior, 
as in Eqs. (7.6)-(7.7). In the Gaussian case, the marginal likelihood (7.13) can be computed 
analytically. 


T Gauss 


exp 


X ^ + Vxx') ^Cx' 

. XX' 


(7.16) 
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where the Ca’j Vxx' defined as in Section 7.2. This is precisely the same result as for 
the Bayesian likelihood of Eq. (7.9), 

For the case of no experimental correlations between the n^^’s, the marginal likelihood 
with Gaussian prior thus simplihes just like in Eq. (7.11). In this case, the marginal 
likelihood with a flat prior also gets an analytical expression. 


= n (7-17) 

-1 E ('‘‘“lev, c/l - {1 + e^KiA?)) C-y* [l^flcv.c,] - m“(1 + (xKjAp) 


exp 


with 




= 


C.xhx if Cxhx G [—-\/3,-\/3] 
\73 if C,xhx > \/3 

-\/3 iiCxhx<-V^ 
where Qx, fix are defined as in Eq. (7.8), (7.10). 


(7.18) 


7.4 Numerical results 

The frequentist marginalisation (likelihood (7.16) for the Gaussian prior or (7.17) for the 
flat one) is not illustrated here because the frequentist framework may seem slightly less 
consistent than the Bayesian one and the error combinations are more delicate. For these 
reasons, we rather recommend to use the Bayesian marginalisation technics for the Higgs 
fits. In any case, the Bayesian and frequentist approaches are expected to converge as the 
experimental uncertainties become small relatively to the theoretical ones. This situation 
will gradually occur in the next LHC Runs due to the decrease of the statistical uncertain¬ 
ties and the expected improvement in the knowledge of the experimental systematic errors. 
We have described this feature in Ref. [58]. 

Now as a general remark allowing a better comprehension of the following subsections, 
let us try to explain in simple words the reason why the presence of nuisance parameters 
can indeed modify the size and the location of the best-fit domains in cy — cj. 

For the sake of understanding the impact on the size, it is easier to focus on frequentist 
marginalisation. Frequentist marginalisation can be seen as an approximation of Bayesian 
marginalisation, so that the same explanation holds for both. The frequentist marginalisa¬ 
tion consists of a maximisation of the nuisance parameter (say 5^) at any point in the space 
of the parameters of interest. This means that the value of 5^ at a given point is chosen 
in order to maximise goodness-of-fit. Now, this improvement of goodness-of-fit is typically 
larger for the points far away from the best-fit point than for those close by the best-fit 
point. When this fact is true (which is usually the case), the operation of marginalising 
tends to enlarge the best-fit regions. 

The effect of the nuisance parameters on the location of the best-fit regions in cy — cj can 

^®Hence the same likelihood (with a sum in quadrature) as in the Bayesian framework arises, in the case 
of neither experimental nor theoretical correlations. 
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- Gaussian prior, no correlations - 
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Figure 4: The best-fit regions in the cy — Cf plane obtained from Bayesian marginalisation 
and Gaussian priors for the theoretical uncertainties. The 68%, 95% and 99% credible regions are 
represented respectively by the green, yellow and grey domains. No theoretical error correlations 
between the Higgs detection channels are taken into account in this figure. The dashed contours 
illustrate the case without theoretical uncertainties. The SM prediction is shown by the red point. 

be understood as follows. Recall that the nuisance parameters enter in the likelihood as 
//|^(1 + (see Eq. (6.32)), so that they shift the central experimental value of the 

signal strength. This in turn can induce a change in the location of the best-fit point in 
Cv — Cf. Such a shift actually occurs if a non-zero value of <5^ is preferred. This happens 
when a non-zero value for 5^ helps relaxing the tensions {i.e. different preferred values 
of Cv, Cf) among various signal strengths p,f^. Notice that this means that the likelihood 
itself favours a non-zero value for 6^, even though the prior of 6^ is centered on zero. 

7.4.1 The forbidden case: no correlations 

Following our overview approach, let us start with the simplest case: the Bayesian margina¬ 
lisation in the absence of correlations between the theoretical errors of the different Higgs 
channels. Let us take for instance a Gaussian prior (taking a flat one would not change 
our conclusions). This case was described in more details in the beginning of Section 7.1 
as well as in Section 7.2. In this “de-correlated” case, the likelihood is simply the primary 
likelihood (5.1) with a summation in quadrature of the absolute experimental and theo¬ 
retical errors, (A/i|^)^ -|- The best-fit domains in the cy — Cf plane are derived 

following the standard procedure described in Section 2, and are shown in Fig. (4). Here 
and throughout Section 7.4, the priors for cy,c/ are taken flat, Tr{cvj) oc 1. 

We see on this figure that the theoretical SM prediction (cy = Cf = 1) lies well within the 
68 % C.L. region. Physically, this implies that, with such a fit, no physics beyond the 
SM is required to interpret the 8 TeV LHC measurements of the Higgs rates. The increase 

"^®The acronym C.L. will stand for Credible Level within the Bayesian framework and for Confidence 
Level in the frequentist framework. 
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Figure 5: The best-fit regions in the cy — Cf plane obtained from Bayesian marginalisation and 
flat priors for the theoretical uncertainties. The 68%, 95% and 99% credible regions are represented 
respectively by the green, yellow and grey domains. The [a] and [b] plots correspond, respectively, 
to the two characteristic correlation configurations described in Eq. (7.4) and Eq. (7.5). The dashed 
contours illustrate the case without theoretical uncertainties. The SM prediction is shown by the 
red point. 

of the best-fit domain sizes induced by the existence of theoretical errors is relatively weak, 
due to the sum in quadrature, as observed when comparing to the best-fit regions obtained 
with vanishing theoretical errors. The latter regions are superimposed on Fig. (4) for illus¬ 
tration purpose (as the dashed contours) and to ease the comparison with next plots. 
However let us recall that the likelihood used here (and leading to the colored regions 
of Fig. (4)) is not realistic as the correlations among the Higgs channels should not be 
neglected. We thus do not recommend the use of this likelihood. 

7.4.2 Flat prior 

From now on we consider the more realistic likelihoods obtained in Section 7.2. These 
likelihoods contain all the correlations between Higgs channels induced by the theore¬ 
tical uncertainties. First, we consider the configuration with two independent nuisance 
parameters (see Eq. (7.2) and Eq. (7.4)). The Bayesian marginalisation over these two 
nuisance parameters leads to the analytical likelihood (7.12) for flat final priors. Applying 
the standard Bayesian procedure, described in Section 2, we hnd the best-fit regions of 
Fig. (5) [left]. 

By comparing the colored plots in Fig. (4) and Fig. (5) [left], one observes clearly a 
shift of the best-ht regions. This shift originates from the theoretical correlations that are 
taken into account in Fig. (5) [left]. This shift occurs because the relaxation of the tensions 
between the individual signal strength measurements (see discussion in the introduction of 
Section 7.4) is different in the correlated case and in the “de-correlated” one. We emphasize 
that this shift is a consequence of taking into account the theoretical correlations. Indeed 
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Figure 6: The best-fit regions in the cy — Cf plane obtained from Bayesian marginalisation 
and Gaussian priors for the theoretical uncertainties. The 68%, 95% and 99% credible regions are 
represented respectively by the green, yellow and grey domains. The [a] and [b] plots correspond, 
respectively, to the two characteristic correlation configurations described in Eq. (7.4) and Eq. (7.5). 
The dashed contours illustrate the case without theoretical uncertainties. The SM prediction is 
shown by the red point. 


we will see in next subsection that the same effect occurs for a different prior shape. 
Concerning the region size, a slight increase occurs relatively to Fig. (4). This comparison 
can be done by looking at the reference case (dashed contours) without theoretical errors 
at all, which is once more superimposed on Fig. (5) [left]. 

The plot on the right hand side of Fig. (5) is the same as the left plot but for the second 
correlation configuration, involving a single nuisance parameter (discussed in Eq. (7.3) and 
Eq. (7.5)). The effect of the theoretical correlations (relatively to Fig. (4)) appears to be 
softer than for the left plot: the shift is smaller. This difference between the two colored 
regions of Fig. (5) makes clear that the theoretical correlations have an important impact 
on the fits, and should thus be carefully taken into account. 

As described below Eq. (7.3), the most realistic correlation configuration is most proba¬ 
bly an intermediate configuration between those adopted in the two plots of Fig. (5). 
We thus conclude that, with the statistical treatment adopted here, the SM prediction 
remains in a good agreement (Icr level) with the 8 TeV LHC Higgs data, even once realistic 
theoretical correlations are taken into account. 

7.4.3 Gaussian prior 

Fig. ( 6 ) illustrates the same case as in Fig. (5) except that the final priors are now Gaus¬ 
sian, which leads to the marginalised Bayesian likelihood of Eq. (7.9) and Eq.(7.11). It 

this stage, we recall that the Gaussian priors are obtained from a combination of all the individual 
priors, while the flat priors have just been chosen ‘by hand’ to illustrate what happens for completely 
different distributions. 
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Figure 7: The data-dominated posterior p((5ggF|Aif’') (Eq. (7.19)). The 68%, 95% and 99% credible 
domains are indicated respectively by the green, yellow and grey areas. 

appears that there is no substantial difference (neither in location, size nor shape of the 


best-fit regions) between these two figures. This illustrates the mild impact of the choice of 


the shape for the prior of the theoretical uncertainties. We conclude that, with the present 
statistical uncertainties on Higgs data, the recurring question of the exact shape of the 
prior, in particular for the errors due to truncated perturbative expansions in QCD, is 
nearly irrelevant. 

However we should stress that this insensitivity to the prior shape occurs because the 
experimental uncertainties of the current data are typically larger or of the same order as 
the theoretical ones. This situation is expected to change with the upcoming LHC runs, 
as the statistical uncertainties will decrease with the integrated luminosity. 

7.4.4 The nuisance parameters favoured by the data 

Let us now consider the posterior distribution for the theoretical uncertainties themselves, 
instead of the posterior for the parameters of interest. Here we shall take the priors 
associated with the theoretical uncertainties (vr^) as flat and with an infinite range. For 
such choice of prior, the information of the posterior is fully contained in the likelihood 
(second line in Eq. (7.19)). The interest of this data-dominated posterior is that it allows us 
to study exclusively the information that the sole Higgs data provide about the theoretical 
uncertainties, A^. 

We first consider the case with a single nuisance parameter Jggp {i.e. the fully correlated 
case), given in Eq. (7.3), and we present in Fig. (7) the data-dominated posterior for (5ggF, 



X 


(7.19) 



»ex — 1 

'ij 


(^iif[cv, Cf] - (JggpA^)^ 


Including the details of the form at the boundaries in case e.g. of a flat distribution. 
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This posterior is obtained by integrating the likelihood of Eq. (6.32) (with given by 
Eq. (7.4)) over all 6's but one, chosen to be (JggFj and marginalising with respect to the 
cv,Cf parameters with 7r{cv,Cf) oc 1 . 

It appears in Fig. (7) that the posterior for dggp is centred on dggp ^ —1. This means 
that for each signal strength, the data typically favour a value falling at Tier (i.e. at ±A(^) 
from the nominal value In other words, for the correlation conhguration of Eq. (7.3) 
the Higgs data provide a non-trivial indication that the magnitudes of the theoretical errors 
are reasonably well estimated. Indeed, the theoretical estimations predict the to lie 
typically within the la interval ±A(^. 

This compatibility suggests that the A(* uncertainties, whose estimations rely on quite 
ad hoc QCD scale variations and on the arbitrariness in the choice of PDF sets, are nev¬ 
ertheless quite robust. On the other hand, one also notices in Fig. (7) that the credible 
intervals for p(^ggFlMi^) go beyond —1. This could be taken as an argument for slightly 
increasing the overall magnitude of the theoretical uncertainties (see next subsection). 

The correlation configuration with two nuisance parameters, given by Eq. (7.2), leads 
to larger preferred values for the nuisance parameters dggp — —2, 5 vbf — —5. We interpret 
these very large values as the fact that neglecting totally the correlation between the two 
nuisance parameters is an unrealistic hypothesis (as already described in Section 7.1). 
As a matter of fact, if one restored the usual prior for the d’s (i.e. a prior with unit 
variance, I7[(5] = 1), a hypothesis testing would show that the data favour the correlation 
conhguration of Eq. (7.3) with respect to the conhguration of Eq. (7.2). 

7.4.5 More conservative theoretical errors 

Throughout this paper, we have been observing that, among the various origins of theo¬ 
retical uncertainty involved in the Higgs ht, some are of a nature (see Section 6.1 - 6.5) 
which renders difficult the exact determination of the associated la interval. These are 
the truncation of the perturbative expansion for the QCD calculation of Higgs rates trans¬ 
lated into an arbitrary error range for the renormalisation/factorisation scale p = pji = p-p 
(affecting the production and decay amplitudes as well as the a<j coupling constant), the 
choices made (on the statistical method, the number of free parameters...) in the different 
PDF sets, and hnally the mi, renormalisation scheme and EFT assumptions for the ggF 
mechanism. These considerations can be taken as a motivation to adopt more conservative 
theoretical errors. 

Moreover, we have seen in the previous subsection (see Fig. (7)) that the data tend to 
prefer theoretical uncertainties that are somewhat larger than the combined la width A(^ 
obtained in Section 6, see e.g. the 68% C.L. interval in Fig. (7). Taking seriously this fact, 
it makes sense to perform the fits with a slight overall increase of the uncertainties. We 
suggest a rescaling 

Af^l.SAf (7.20) 

^®For comparison, the maximum of pftiggF, cv = Cf = l\Pi^) is reached for tiggp ~ —0.7. 
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Figure 8: The best-fit regions in the cy — Cf plane obtained from Bayesian marginalisation and 
flat priors for the theoretical uncertainties. The 68%, 95% and 99% credible regions are represented 
respectively by the green, yellow and grey domains. The [a] and [b] plots correspond, respectively, 
to the characteristic correlation configurations described in Eq. (7.4) and Eq. (7.5). The dashed 
contours illustrate the case without theoretical uncertainties. The SM prediction is shown by the 
red point. The difference with Fig. (5) is the enhancement of the uncertainties, accordingly to 
Af ^ Af X 1.5. 


as a reasonable estimation for a most conservative choice of theoretical uncertainties. Notice 
that the rescaling of Eq. (7.20) is equivalent (c./. Eq. (6.32)) to rescale by 1.5 the axis on 
Eig. (7). For example, the point Sggp = —1 becomes Jggp = —1.5. 

The best-fit regions with A^ x 1.5 are shown in Fig. (8) for the two correlation configu¬ 
rations and considering the flat prior case (Eq. (7.12)), keeping in mind that with the 
current Higgs data, the final prior shape does not affect significantly those best-fit domains. 
The impact of the increase of the theoretical uncertainties (Eq. (7.20)) on the fit of the 
current Higgs data can be seen by comparing Fig. (5) and Fig. (8). It turns out that the 
shift of the preferred regions with respect to the case without theoretical errors gets slightly 
accentuated. In the correlation configuration of Eq. (7.4), i.e. with two independent Sx, it 
even appears (see Fig. (8) [left]) that the SM point moves just outside the 68% C.L. region. 

The increase of this shift can be understood by recalling that rescaling the A^ is 
equivalent to increase the width of the Sx prior. It is then clear that more possibilities are 
opened for the preferred values of Sx ■ It turns out that these preferred values move further 
away from zero, which induces a more pronounced shift of the best-fit regions. 

Even though these effects are not statistically significant for the current Higgs data, we 
stress that the impact of the theoretical errors will increase while more data will be accu¬ 
mulated at the LHC. The ambiguity existing in the theoretical errors estimation deserves 
thus to be taken into account. For future LHC phenomenological studies, we suggest to 
take into account, in the same way as proposed in this subsection, the impact on the hts 
from the lack of knowledge in theoretical errors. 


48 









8 Biasing the Higgs likelihood 


The principle of bias has been presented in Section 2.2.2. To have a self-consistent section, 
we recall here the basics of a “biasing” procedure. We distinguish two realisations of the 
bias principle: the extremal bias and the envelope method. 

The method of extremal biasing consists in drawing the best-fit regions for the para¬ 
meters of interest for extreme fixed values of the theoretical errors. By the word ‘extreme’, 
we mean that we set the nuisance parameters 6 at ±1 (corresponding to one-standard 
deviations with our conventions) in order to obtain a strong impact on the fit. In our 
Higgs fit, the theoretical uncertainties affect the signal strengths which in turn modify 
the preferred value of pf^{cf,cv) and thus the best-fit regions of cv,Cf. Note that the 
choice of extreme values 5 ± 1 can be seen as natural, and for that reason will be used in 
our numerical results, but strictly speaking remains only a choice with a certain degree of 
arbitrariness. 

The envelope method corresponds formally to the continuous version of this extremal 
biasing. Loosely speaking, this is what one obtains if one does the fit for each hxed value 
of the nuisance parameters between the extreme values 5 = ±1. One expects typically a 
deformed contour somehow interpolating between the regions of extremal biasing. For a 
more formal and unihed description of these biasing methods, see Section 2.2.2. 

What are the motivations for choosing the marginalisation or the bias approaches 
(extremal bias or envelope method) in the Higgs fits? The lack of knowledge on the shape 
of the prior associated to the main QCD uncertainties discussed in Section 6.3 encourages 
one to apply a bias method, which does not rely on the prior shape - in contrast with the 
marginalisation. 

Besides, the bias is more conservative. Indeed, while in the marginalisation the best-fit 
domain corresponds roughly to nuisance parameters centered around a preferred dx value, 
in the bias methods 6x rather spans by construction its [—1,1] interval without favouring 
any value. Hence, generally speaking (and this is the case for the Higgs fit), the best-fit 
regions in the space of the parameters of interest obtained through the bias methods are 
wider than the ones from marginalising. 

In addition, the envelope method allows one to see at a glance the whole best-fit domain 
in the cy — Cf plane spanned by varying the nuisance parameters inside their entire [—1,1] 
intervals. The price to pay here is maybe a heavier technical approach than in the margina¬ 
lisation procedure: compare the marginalisation dehnitions in Eqs. (2.9),(2.10) with the 
biasing dehnitions in Eqs. (2.14),(2.17) (see for example Eq. (6.32) and Eq. (8.5) for the 
application to the Higgs likelihood). It is clear that more operations (either integrations 
or maximisations) are needed for the envelope method. 


8.1 Combining the uncertainties 

The starting point is the likelihood (5.1), and then (5.11). Applying the Eqs. (3.21)-(3.22)- 


(3.23)-(3.24) together with the dehnition of Eq. (6.28) and, 

+ A 


Ax. 


^SM 


E 


i SM 

fcx/Ox' 


PDF-l-a. 

X 


( 8 . 1 ) 
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which is a new compact notation comparable to Eq. (6.19), we obtain the likelihood de¬ 
pending on a unique nuisance parameter, 5b, 


ibias(<5b) = exp 


■1 E - ^“(1 + 4A‘)) C“-‘ [^,f[cv,c,\ - ,.f (1 +itA$)) 

( 8 . 2 )■ 

relying on the combined error, 

= l^ggF,i “ + ^VBF,i + ^WH,i + ^ZH,i + + ^Y,i) , (8.3) 

Y,a 


or. 


— |^ggF,i “ + ^VBF,i + + ^ZH,i)| + '^i^Y,i + ‘^Y,i) , (8.4) 

Y,a 

for the two configurations of correlations defined in Eq. (7.2)-(7.3), respectively. 

The combinations of the errors on the partial decay widths are dictated by the fact 
that their nuisance parameters are either independent (among them and from the nuisance 
parameters at the production level) or taken 100% correlated to each other, as discussed 
in Section 6.5. 

In Eq. (8.1), is either equal to (see Section 6.2) or taken as = A|gp® -|- 

A^gP, for the ggE channel (instead of Eq. (6.11)) with now, A^^p ~ 9%, from the linear 
sum of the three errors originating from EET assumptions and mb scheme dependence [26] . 
These linear summations are all motivated by the fact that these errors are independent. 
The uncertainty entering Eq. (8.1) is obtained from Ref. [20, 52] using an “en¬ 

velope method”, which corresponds exactly to the combinations in the bias approach pre¬ 
sented in Section 3.5. Indeed, this combination is equivalent to a linear sum of the individual 
errors A^*, A^*®" and A^®, which are independent (c./. Section 6.1). Einally, the linear 
sum in Eq. (8.1) is justified by the independence of the errors A™^ and 

The Icr-errors (A’s) are taken to be exactly the symmetrized errors provided by the 
LHCHWG [17, 18, 20] in order to be conservative (similar discussion as in Sections 6.3 and 
6.5). These errors are consistent with the previous marginalisation framework, so that the 
results from bias and marginalisation can readily be compared. 


8.2 The Bayesian approach 
8.2.1 Extremal bias 

According to Section 2.2.2, the extremal bias within the Bayesian framework consists in 
deriving the best-fit regions in the cy — c/ plane for two fixed values of the nuisance 
parameters, 5b = ±1, using the likelihood Tbias(<5fe) of Eq. (8.2). Recall that in the Bayesian 
case, the best-fit regions are computed by integrating the posterior density probability, 
according to Eqs. (2.3)-(2.4)-(2.5). The priors (7r(0)) for the parameters of interest (here 
9 = cy,Cf) entering Eq. (2.3) are taken flat, i.e. 7r(cyj) oc 1. 

Note that, if the two extreme regions have an overlap, one cannot display them together 
consistently. Instead, one has to follow the rigorous definition of Eq. (2.14), using a discrete 
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domain V = { — 1,1}. This equation dictates to use the sum of the posteriors at 5b = —1 
and 5b = 1, with each posterior separately normalised by its integral over the cy — Cf plane. 

8.2.2 Envelope method 

The envelope method corresponds to letting vary continuously 5b within [—1,1], i.e. this is 
the continuous version of the extremal bias, as discussed in Section 2.2.2. The correspond¬ 
ing likelihood is 

LB{cf,cv) = J d5b 

This likelihood is derived by applying Eq. (2.14) with the likelihood Lyiss{cv,Cf,5b) from 
Eq. (8.2). The best-fit regions are obtained through the standard procedure of Eqs. (2.3)- 
(2.4)-(2.5). Again, we take the priors for the parameter of interest to be flat, 7r(cyj) oc 1. 

8.3 The frequentist approach 
8.3.1 Extremal bias 

For the extremal bias in the frequentist framework (see Section 2.2.2), one uses again 
the likelihood Tbias(<5b) (Eq. (8.2)), with 5b fixed at the two extreme values 5b = ±1. In 
practice, in order to draw the best-fit regions in cy — c/, one can define a y-squared function 
difference 


-^bias(c/? 1 


f dcf f dcv Tbias(c/-,Cy,5b) 


(8.5) 


Cy, 5b) = x^(c/, Cy, 5b) - X^{cf, cv, 5b) , X^(c/, cy, 5b) = -2 log[Lbias(<56)] , (8.6) 

as follows from Eq. (2.6). Remind that x^(c/, cy, (5;,) stands for the minimum of with 
respect to c/,cy for a given 5b- The best-fit regions are obtained by drawing the contour 
levels of Ay^ set at the values given in Eq. (2.8). Once more, the prior for the parameters 
of interest entering in Eq. (2.6) are taken flat, 7r(cyj) oc 1. 

If the two extreme regions overlap, the same remark as in the Bayesian case holds. 
To display consistently the two regions together, one has to follow the rigorous definition 
of Eq. (2.17), using a discrete domain V = {—1,1}. This equation dictates to use the 
minimum of the two Ay^, i.e. min^j_g|_]^ [Ay^(cj-, cy, (5;,)]. 

8.3.2 Envelope method 

For the envelope method in the frequentist case, one can proceed with the y^ introduced 
in Eq. (8.6) and define 


X^(c/,cy) 


min 

(55E[—1,1] 


X^{cf,cv,5b) 


X^icf,cv,5b) , 


(8.7) 


according to the general definition of Eq. (2.17). This equation is the frequentist analog of 
Eq. (8.5). In order to draw the best-fit regions in the cy — c/ plane, one should then define 


Aybc/,cy) = ybc/,cy) - y2(c/,cy) . 


( 8 . 8 ) 
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The best-fit regions are obtained by drawing the contour levels of set at the values 
given in Eq. (2.8). Again, the prior for the cy, Cf parameters entering in Eq. (2.6) are taken 
flat, 7r(cyj) oc 1. 

Let us finally recall the parallel between Eq. (8.5) and Eq. (8.7). As first explained in 
Section 2.2.2, the subtracted term in Eq. (8.7) is the frequentist analogy of the ratio over 
f dcfdcv L\jigs{cf,cv,6b) in Eq. (8.5). In both cases, the effect of this term is to remove 
the contribution of 6b to goodness-of-ht (which avoids favouring specific values of 6b)- Both 
formulas are analog up to exchanging integration over 6b with minimisation over 6b- The 
fact that the integration/minimisation over 6b is performed on the whole range [—1,1], 
rather than on the discrete domain {—1,1}, leads to an envelope in the Cf — cy plane, 
instead of two distinct domains as in the extremal bias. 

8.4 Numerical results 

In this section, we apply both the frequentist and Bayesian versions of the bias method 
to the Higgs likelihood. We stress that the Higgs likelihood Lbias(<J6) is exactly the same 
in the two statistical frameworks, so that the discrepancies observed among the plots 
originate solely from the different statistical treatments. These two treatments differ in 
their dehnition of the best-fit regions (see Section 2.1) and their realisation of the bias 
principle (see Eqs. (2.14), (2.17)). 

8.4.1 Extremal bias 

In Fig. (9), we present the best-ht regions obtained through the Bayesian and frequentist 
bias methods, respectively described in Sections 8.2.1 and 8.3.1. The likelihood, Lbias (5b) 
of Eq. (8.2), is used together with one of the two combined errors (8.3)-(8.4) depending on 
which correlation configuration is considered (Eq. (7.2) or Eq.(7.3) respectively). 

The left and right pannels of Fig. (9) correspond to the two correlation configurations 
surrounding the case with realistic correlations. It turns out that the best-ht regions 
obtained in these two extreme correlation conhgurations have only mild differences. 

Now, compare the two upper plots and lower plots of Fig. (9), corresponding respectively 
to the frequentist and Bayesian treatments. A small difference appears at the junction of 
the two set of regions, coming from the different realisation of the bias principle in the 
two statistical frameworks. Besides, the frequentist best-ht regions are slightly larger than 
the Bayesian ones, due to the non-equivalent dehnitions of the Bayesian and frequentist 
contours. Overall, there is a strong resemblance between the Bayesian and frequentist 
results. This rehects the weak impact of choosing the Bayesian or frequentist procedure 
for the extremal bias. 

Let us now compare the lower plots of Fig. (9) with the previous Bayesian margina¬ 
lisation plots obtained in Fig. (5) - considering of course respectively the two correlation 
conhgurations used in the left and right plots. One can clearly see that the best-ht regions 
obtained from the extremal bias are larger than the ones obtained through marginalisation. 

®°Notice that these best-fit regions include essentially the two extreme sub-domains corresponding to 
Sb = ± 1 . 
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This is because the regions in Fig. (5), derived by marginalising, correspond somehow to 
fix the nuisance parameters to their values favoured by the fit. For the present Higgs fits, 
it turns out that these preferred values are close to S ~ —1. Hence, the regions from the 
extremal bias (Fig. (9)) being obtained for = ±1 (lower left set is for S/, = —1 they 
clearly cover more space in the cy — Cf plane than the domains in Fig. (5). 

8.4.2 Envelope method 

The four plots of Fig. (10) illustrate the Bayesian and frequentist envelope methods per¬ 
formed accordingly to Sections 8.2.2 and 8.3.2. Again, both correlation conhgurations, 
giving rise to the combined errors of Eq. (8.3)-(8.4), are studied numerically. The two 
upper and lower plots of Fig. (10) differ due to the direct envelope method being not 
equivalent within the Bayesian and frequentist cases. 

The sets of frequentist envelopes represent the best-fit areas that would be obtained by 
superimposing the best-ht regions of the extremal bias, but for 6/, spanning continuously 
the interval [—1,1]. This correspondence between the envelope method and extremal bias 
appears clearly when one realises (c./. end of Section 2.2.2) that the former is based on 
the Eqs. (8.7)-(8.8) while the latter can be obtained through the same equations just with 
a minimisation over the discrete domain Si, G T> = {—1,1} in Eq. (8.7), instead of the con¬ 
tinuous range [—1,1]. The correspondence is visible when comparing the envelopes with 
the extreme sets of best-fit domains at Sh = ±1, obtained previously from the frequentist 
bias method and also superimposed on upper plots of Fig. (10), as dashed contours: these 
contours draw exactly the extreme limits of the envelopes. 

The two sets of Bayesian envelopes obtained in the two lower plots of Fig. (10) represent 
less conservative regions with respect to the frequentist envelope. Besides, the envelopes 
of these plots cover smaller regions than the best-fit domains that would be obtained by 
superimposing the best-ht regions of the extremal bias, but for S/, spanning continuously 
the interval [—1, Ij. This appears clearly when comparing those envelopes to the extreme 
sets of best-ht regions at, Sb = ±1, obtained previously from the Bayesian bias method 
(once more superimposed on the lower plots of Fig. (10), as dashed contours). 

Finally, we mention that the SM point belongs to all the 68% C.L. regions of Fig. (10). 
At this level, we can illustrate one of the interests of the bias. Let us consider an hy¬ 
pothetical but plausible situation. For example, suppose that with future LHC data, the 
SM point would fall outside the 3cr region obtained by marginalising. Such a discrepancy 
could be interpreted either as an indirect effect of physics underlying the SM on the Higgs 
sector, or as a shift of the best-ht regions induced by values of the nuisance parameters 
favoured statistically by the ht. This shift induced by the nuisance parameters would come 
from the fact that the nuisance parameters and the parameters of interest are determined 
simultaneously. In contrast, in the envelope method, a SM prediction falling beyond the 
3(7 region would indicate the presence of new physics without any alternative explanation 
relying on the statistical treatment (the entire interval of the nuisance parameters being 

®^The dependence of the best-fit region location on the nuisance parameter is induced by the dependence 
of the likelihood (8.2) on, 
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Figure 9: The best-fit regions in the cy — Cf plane obtained through an extremal bias. The 68%, 
95% and 99% conhdence regions are represented respectively by the green, yellow and grey domains. 
The upper plots illustrate the frequentist approach whereas the two lower ones show the Bayesian 
approach. The [a], [c] and [b], [d] plots correspond, respectively, to the characteristic correlation 
conhgurations described in Eq. (8.3) and Eq. (8.4). The dashed contours illustrate the case without 
theoretical uncertainties. The SM prediction is shown by the red point. 

covered). This example provides a motivation to apply both bias and marginalisation 
methods, which are somehow complementary. 
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Figure 10: The best-fit regions in the cy — Cf plane obtained through the envelope method. 
The 68%, 95% and 99% confidence regions are represented respectively by the green, yellow and 
grey domains. The upper plots illustrate the frequentist approach whereas the two lower ones show 
the Bayesian approach. The [a], [c] and [b], [d] plots correspond, respectively, to the characteristic 
correlation configurations described in Eq. (8.3) and Eq. (8.4). The dashed grey contours illustrate 
the best-fit regions at 68% C.L., 95% C.L. and 99% C.L., obtained in Fig. (9). The SM prediction 
is shown by the red point. 

9 Conclusions 

The main goal of this analysis was to work out a consistent statistical treatment of the 
theoretical uncertainties in the fits of the Higgs boson rates. We have analysed in a unihed 
formalism both the Bayesian and frequentist approaches to theoretical uncertainties. We 
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systematically analysed how to perform error combinations in a given statistical context 
and we have introduced a framework to use the bias principle on firm ground. 

This analysis has been the opportunity to update the Higgs rate fit based on the 
latest LHC data at 7 and 8 TeV. In the case of Bayesian marginalisation, we have found 
that the SM prediction for the Higgs couplings still falls into the 68% C.L. region of the 
cy — Cf plane. Bayesian marginalisation benefits from well-defined distributions for the 
nuisance parameters and from an easier convolution of these error distributions compared 
to frequentist marginalisation. 

We have reviewed all the fundamental sources of the individual theoretical errors in¬ 
volved in the SM Higgs cross sections and branching ratios. Then those errors have been 
combined in a careful ‘step-by-step’ approach following the Bayesian rules. In this task 
of combining a significant number of uncertainties (various Higgs production modes, de¬ 
cay channels...), we were helped by the leading moment approximation - which has been 
deduced from considerations on the moment-generating function. 

This has allowed us to show that the prior of the total uncertainty resulting from the 
combination of all the theoretical errors (using flat priors for the unknown ones) converges 
to a nearly Gaussian shape. Besides, it also came out from the numerical results that the 
precise form of this final theoretical prior is not crucial with respect to the determination 
of the best-fit regions. This conclusion holds only for the present data, which still have 
large experimental errors with respect to the theoretical ones. 

In contrast, our analysis has shown that the correlations of the theoretical uncertainties 
among the Higgs detection channels induce a significant shift of the best-fit domains in the 
space of the parameters of interest. These correlations appear thus to be an unavoidable 
ingredient of the fits. The Higgs fits were performed in two extreme configurations of the¬ 
oretical correlations between the various detection channels. The most realistic correlation 
setup is an intermediate configuration between those two. Such an approach is thus con¬ 
servative. Besides, considering characteristic configurations has allowed us to derive simple 
analytic expressions for the marginal likelihood functions. 

For future Higgs fits, given the ambiguities inherent to the estimation of the theoretical 
error magnitudes, we recommend to present an additional analysis with Icj errors enhanced 
by a typical factor of 1.5 as a conservative benchmark. Such a factor is consistent with the 
la theoretical errors preferred by the data. Of course the present degree of arbitrariness in 
the theoretical error magnitudes could be improved for instance with future higher order 
QCD calculations or new methods to determine the PDFs. 

Finally, we have provided a rigorous statistical framework for the bias principle, which 
constitutes an alternative to marginalisation. This framework has lead us to define two 
complementary bias treatments; the extremal bias and the envelope method. The bias 
principle is more conservative than marginalisation by construction, and does not depend on 
the shape of the priors of the nuisance parameters, which are not always known. Therefore, 
a reasonable advice is to apply both the marginalisation and bias methods to the Higgs 
data. Using the envelope method, we find that the SM prediction belongs to the 68% C.L. 
region of the cy — cj plane. 
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Appendix 

A The leading moment approximation 

Consider a linear combination 5c of random variables 5a, 5b, given by 

hc^C = ^A‘^A + (A.l) 

with Ab Aa and E[(5] = 0, V[(5] = 1 by convention. The pdf of 5a, 5b, 5c are noted 
respectively tta, t^b, t^c- 

We mainly work in Laplace space, using the moment-generating function 

4>z{t) = E[e^*] = I dz e^\z{z) . (A.2) 

If all moments are finite, (j)z{t) = where denotes the n-th moment of Z. 

mi being the mean, we have = 0 = mf. m 2 being the variance, we have m^ = 1 = m^ ■ 
Let us assume in a first place that 5a, 5b are uncorrelated. This implies that 7ta,b = 
ttat^b, that the pdf of 5c is given by a convolution product, and that the moment generating 
function of 5c is given by the product 

(fci^ct) = (pAi^At) 4>B{ABt) . (A.3) 

Having <C A a by assumption, we can nse Ab/Aa has an expansion parameter. At 
leading order, neglecting the contribution from <5^ to the combination amounts to appro¬ 
ximate 

(pBi^Bt) = ^ + 0{A‘^t‘^) (A.4) 

in the product A.2. This corresponds to approximating ttb as a Dirac distribution centred 
on zero. 
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Going one order further in the expansion leads to keep 


— 1 + • (^•^) 

This subleading term induces 0(A^/A^) corrections to the moments of Sc- Explicitly one 
finds 

AJ = A^ ^ . (A.6) 

with Nn = n!/(2(n — 2)!). At that point, the corrections to all moments should in 
principle be kept. 

We then take a second step in our approximation, by considering that the amount of 
information relevant for our problem somehow decreases with the order of the moment. 
As a consequence, the corrections to the first moments are the more relevant. Keeping the 
next-to-leading corrections up to order p, our approximation scheme thus reads 


A^m^ = 


A^(m^ + ^m^_ 2 Nn + 0{^)^ if l<n<p 
A^(m^ + if p<n. 


(A.7) 


In particular, truncating the corrections at p = 2 amounts to take into account only 
the correction to the variance, 

Al = A\ + Al. (A.8) 

The other details of the shape remaining unperturbed, it follows that 


TTC = VTA + O 


^^(2) MjlS) 

A3 ’ A2 


(A.9) 


Here is the n-th derivative of the Dirac distribution. It comes from the Laplace 
transform of the t"' term of the moment-generating function (see also Ref. [59]). These 5^'^^ 
should be understood as the leading functional deformation to tt^. In practice, it appears 
that keeping only the first leading moment is appropriate when tta is a one-parameter 
pdf. In that case, the parameter characterising ttc is identified through the combination of 
variances. For example, taking the normal distribution tta = A7(0, a\) gives ci^ = a\ + A‘^ 
and TTc = A7(0, u^). 

The approach above also extends to correlated variables. The difference with respect 
to the uncorrelated case is that the moment-generating functions do not factorise, as 5a, 
5b now share common moments. For example, truncating the corrections at p = 2 gives 
the correction 

= A'a A|j -|- 2AaAbp , (A.10) 

where p (= rrii^) is the covariance of (iIa) 5b). In the limit of full correlation, one has p = 1, 
so that A^ = (Aa -|- Ab)'^. Note that when p > Ab/Aa in Eq. (A.10), the contribution 
from the correlation term becomes larger than the contribution from the square term A^. 

is worth noticing that in the Gaussian case, this identification reproduces exactly the correction to 
the mf at any order. This is not true for other distributions. 
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Finally, the leading moment approximation also extends to the case of several linear 
combinations of variables. Here we consider the case with two linear combinations of two 
variables 6 a, with correlation p. The combinations are defined as 


^Ci^Ci = 6 a‘^Ai + 6b^Bi , 
^C2^C2 = ^A^A2 + ^B^B2 ■ 


The variances are found to be 


— A^ + Ag + 2/9 A^jA 




(A.ll) 

(A.12) 

(A.13) 

(A.14) 


A(72 — A^2 + ^B2 + ^pA^jAsj, 

like in the one-combination case described above. In the case A^^ A^j, A^j » A^a, 
the correlation coefficient pi 2 between Sci and 6 c 2 reads 


B 2 


, 1 f^B, A 

■ Si 


^Bi 

Aai Aa 2 


+ 0 


'Ai ' 
-01,2 

A^ 

V ^1,2 ^ 


(A.15) 


In the case A^i ^ A^^, A^a ^ A^j, the correlation coefficient is instead 


Pl2 = P + 


( , ^Aa 

VAai Ab2 


(l-p2) + o 


01,2 

^ ^1,2 , 


(A.16) 
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